Most efficient way to get matching element?

I have an application that scrapes an entire website and it runs in about 12 hours. It uses `WebClient.DownloadString` to get the html, then uses `HtmlParser.ParseDocument` to parse it. Then, I do a lot of other parsing on top of that. This happens for about 293,000 pages so I'm trying to save any little bit of time that I can.

I've noticed that I've got a lot of places where I call `IHtmlDocument.GetElementsByTagName(TAG).FirstOrDefault(QUERY_SELECTOR)`. I believe I could collapse this into some sort of `IHtmlDocument.QuerySelector(QUERY_SELECTOR)` which theoretically would speed up the time by returning after the first match, but some preliminary testing has shown `QuerySelector` to be slow vs the old method. For instance:

`IElement element = doc.GetElementsByTagName("h2").FirstOrDefault(x => x.TextContent.Contains("Climbing Directory"));`

takes about 1 ms, where

`IElement element = doc.QuerySelector("h2:contains('Climbing Directory')");`

takes about 23 ms.

Any suggestions for improving my code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Most efficient way to get matching element? #929

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Most efficient way to get matching element? #929

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions