HTML Parsing – HTML Agility Pack

September 20, 2007 • Damien White

I’ve seen some posts of users asking about parsing HTML in .NET. Lots of people use RegEx which is OK, but wouldn’t it be nice to use XPath. Problem with this is that most sites have mangled HTML not even close to XHTML which is needed to utilize XML tools. It peaked my interest, so I decided to look a bit further.

I came across Chirs Fulstow’s Blog and he discussed the HTML Agility Pack. It’s been around for quite a while and while I haven’t tried it out first hand yet, it sounds great. You can get the latest version of the HTML Agility Pack on CodePlex: http://www.codeplex.com/htmlagilitypack.

Hopefully this will help out those users trying to parse HTML out there…

Posted in asp.net, c# and tagged with ASP.NET

Damien White

I am a software architect with over 16 years of experience. I simply love coding! I have a driving passion for computers and software development, and a thirst for knowledge that just cannot be quenched. I'm happy to share what I know in my quest to learn as much as possible. I focus most of my time on web development using Ruby on Rails, Ember.js, and ASP.NET MVC.

comments powered by Disqus