HTML Parsing - HTML Agility Pack

I’ve seen some posts of users asking about parsing HTML in .NET. Lots of people use RegEx which is OK, but wouldn’t it be nice to use XPath. Problem with this is that most sites have mangled HTML not even close to XHTML which is needed to utilize XML tools. It peaked my interest, so I decided to look a bit further.

I came across Chirs Fulstow’s Blog and he discussed the HTML Agility Pack. It’s been around for quite a while and while I haven’t tried it out first hand yet, it sounds great. You can get the latest version of the HTML Agility Pack on CodePlex: http://www.codeplex.com/htmlagilitypack.

Hopefully this will help out those users trying to parse HTML out there…

Damien White Twitter