<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="http://www.codeplex.com/rss.xsl"?><rss version="2.0"><channel><title>Html Agility Pack</title><link>http://www.codeplex.com/htmlagilitypack/Project/ProjectRss.aspx</link><description>Html Agility Pack is an agile HTML parser library that proposes a read&amp;#47;write DOM and supports plain XPATH or XSLT. It allows you to parse &amp;#34;out of the web&amp;#34; HTML files. The parser is very tolerant wi...</description><item><title>New Post: Project license vs LGPL</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=30912</link><description>&lt;div style="line-height: normal;"&gt;Hello,&lt;br&gt;
&lt;br&gt;
I work for a company who is interested in using the Agility Pack for a website parser, but I'm a bit unsure about the licensing terms.&amp;nbsp; If I create a parser using the agility pack, will it have to be covered by the agility pack license as well?&amp;nbsp; I.e., is the license similar to GPL or LGPL?&lt;br&gt;
I'd really like to use this nice software :)&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Alexander&lt;br&gt;
&lt;/div&gt;</description><author>EmptyDoor</author><pubDate>Sat, 05 Jul 2008 10:45:41 GMT</pubDate><guid isPermaLink="false">New Post: Project license vs LGPL 20080705104541A</guid></item><item><title>New Post: Stripping harmful HTML from user input, but allowing other HTML?</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=24346</link><description>&lt;div style="line-height: normal;"&gt;Have a look into this thread... it might help you&lt;br&gt;
&lt;a href="http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=16092"&gt;http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=16092&lt;/a&gt;
&lt;/div&gt;</description><author>VijayKarla</author><pubDate>Mon, 30 Jun 2008 14:57:43 GMT</pubDate><guid isPermaLink="false">New Post: Stripping harmful HTML from user input, but allowing other HTML? 20080630025743P</guid></item><item><title>New Post: Stripping harmful HTML from user input, but allowing other HTML?</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=24346</link><description>&lt;div style="line-height: normal;"&gt;This&amp;nbsp;piece code works excellent.
&lt;/div&gt;</description><author>VijayKarla</author><pubDate>Mon, 30 Jun 2008 13:44:13 GMT</pubDate><guid isPermaLink="false">New Post: Stripping harmful HTML from user input, but allowing other HTML? 20080630014413P</guid></item><item><title>New Post: User Name and Password for tfs01.codeplex.com</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=30312</link><description>&lt;div style="line-height: normal;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Hi,&lt;br&gt;
&lt;br&gt;
what is the username &amp;amp; password for&amp;nbsp;User Name and Password for tfs01.codeplex.com. i used my codeplex username &amp;amp; password to open it. but can't open it. &lt;br&gt;
&lt;br&gt;
Please help. its urgent.&lt;br&gt;
&lt;br&gt;
Thanks in advance. &lt;br&gt;
&lt;br&gt;
Chandru.V
&lt;/div&gt;</description><author>chandruvkumar</author><pubDate>Thu, 26 Jun 2008 08:33:05 GMT</pubDate><guid isPermaLink="false">New Post: User Name and Password for tfs01.codeplex.com 20080626083305A</guid></item><item><title>New Post: DHTML Content</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=30276</link><description>&lt;div style="line-height: normal;"&gt;Is there a way to use Html Agility Pack to&amp;nbsp; get HTML nodes that were dynamically generated by Javascript. These are not visible in the output of HtmlWeb.load nor when I directly view the page's source. The only way I can even see these nodes is when I us Firefox's &amp;quot;View Selection Source&amp;quot;. Is there a way to automate this? Or any suggestions to get this in my code?&lt;br&gt;
&lt;br&gt;
Many Thanks,&lt;br&gt;
Chris&lt;br&gt;
&lt;/div&gt;</description><author>chrisg229</author><pubDate>Wed, 25 Jun 2008 17:26:10 GMT</pubDate><guid isPermaLink="false">New Post: DHTML Content 20080625052610P</guid></item><item><title>New Post: XPath expression to find heading tags &lt;h1&gt;</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=30184</link><description>&lt;div style="line-height: normal;"&gt;The forums seems to be *very* quiet, but incase anyone is listening - I'm having very little success with htmlagilitypack as it doesnt seem to work that well. For example I pointed the sample code 'GetDocLinks' at&amp;nbsp;'&lt;a href="http://www.w3schools.com/TAGS/tag_hn.asp"&gt;http://www.w3schools.com/TAGS/tag_hn.asp&lt;/a&gt;' and all it returned was:&lt;br&gt;
&lt;br&gt;
Linked urls:&lt;br&gt;
/favicon.ico&lt;br&gt;
/stdtheme.css&lt;br&gt;
Referenced urls:&lt;br&gt;
&lt;br&gt;
It also completely failed to pick out any other tags. SO - does anyone have recommendations for what's happning or perhaps another product with better support material.&lt;br&gt;
&lt;br&gt;
Thanks&lt;br&gt;
&lt;/div&gt;</description><author>bonson</author><pubDate>Wed, 25 Jun 2008 09:30:32 GMT</pubDate><guid isPermaLink="false">New Post: XPath expression to find heading tags &lt;h1&gt; 20080625093032A</guid></item><item><title>New Post: XPath expression to find heading tags &lt;h1&gt;</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=30184</link><description>&lt;div style="line-height: normal;"&gt;Hi,&lt;br&gt;
&lt;br&gt;
I'd be grateful if someone could nudge me in the right direction here. I have just downloaded the Agility Pack and am playing, however I'm a little stuff. I want to find all &amp;quot;&amp;lt;h1&amp;gt;&amp;quot;&amp;nbsp;&amp;nbsp;tags, for example,&amp;nbsp;on a page so I thought the best way was to adapt one of the examples.&lt;br&gt;
&lt;br&gt;
So, I tried this line&amp;nbsp;&lt;br&gt;
&lt;span style="font-size:13px;color:#008080"&gt;
&lt;p&gt;HtmlNodeCollection&lt;/p&gt;
&lt;/span&gt;
&lt;p&gt;&lt;span style="font-size:13px"&gt; atts = _doc.DocumentNode.SelectNodes(&lt;/span&gt;&lt;span style="font-size:13px;color:#800000"&gt;&amp;quot;//h1&amp;quot;&lt;/span&gt;&lt;span style="font-size:13px"&gt;);
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;br&gt;
However it always returns null - the document definately contains heading tags and that is the correct XPath expression.&lt;br&gt;
&lt;br&gt;
So I've&amp;nbsp;missed something&amp;nbsp;- but what?&lt;br&gt;
&lt;br&gt;
Thanks for any help&lt;br&gt;
&lt;br&gt;
JC&lt;br&gt;
&lt;/div&gt;</description><author>bonson</author><pubDate>Tue, 24 Jun 2008 16:00:05 GMT</pubDate><guid isPermaLink="false">New Post: XPath expression to find heading tags &lt;h1&gt; 20080624040005P</guid></item><item><title>New Post: Proxy settings</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=29762</link><description>&lt;div style="line-height: normal;"&gt;How can I specify proxy settings for HtmlWeb or does it use the same default IE proxy settings?&lt;br&gt;
&lt;/div&gt;</description><author>erhard</author><pubDate>Tue, 17 Jun 2008 05:45:14 GMT</pubDate><guid isPermaLink="false">New Post: Proxy settings 20080617054514A</guid></item><item><title>New Post: HTTP Authentication</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=29761</link><description>&lt;div style="line-height: normal;"&gt;How can I specify a user name and password for the HtmlWeb class to perform http authentication on a password protected directory?&lt;br&gt;
&lt;/div&gt;</description><author>erhard</author><pubDate>Tue, 17 Jun 2008 05:44:26 GMT</pubDate><guid isPermaLink="false">New Post: HTTP Authentication 20080617054426A</guid></item><item><title>New Post: How do I use htmlweb.load with proxy ?</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=29376</link><description>&lt;div style="line-height: normal;"&gt;Hi there , &lt;br&gt;
I want to use htmlweb.load with proxy credentials , &lt;br&gt;
How can I do it ?&lt;br&gt;
&lt;br&gt;
regards,&lt;br&gt;
Eran.&lt;br&gt;
&lt;/div&gt;</description><author>erann</author><pubDate>Tue, 10 Jun 2008 11:23:32 GMT</pubDate><guid isPermaLink="false">New Post: How do I use htmlweb.load with proxy ? 20080610112332A</guid></item><item><title>NEW POST: Get Text and Anchor Text?</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=28855</link><description>&lt;div style="line-height: normal;"&gt;Hello there everyone! First of all thanks for this great pack. It has really helped me a lot!&lt;br&gt;
&lt;br&gt;
I have a couple of questions and if someone would be willing to help me it would be great!&lt;br&gt;
&lt;br&gt;
I am doing a project in which I have to extract three things from a webpage. &lt;br&gt;
1. All of the links (urls) which are inside it.&lt;br&gt;
2. It's pure text (which is the page's html without the tags etc.) and&lt;br&gt;
3. After i've found the links I need to go there in the html file and get 10 words before and 10 words after each one of them.&lt;br&gt;
&lt;br&gt;
The first one isn't that hard.With: HtmlNodeCollection myanchors = htmlData.DocumentNode.SelectNodes(&amp;quot;//a[@href]&amp;quot;); i can find the links. But what about the other 2 parts?&lt;br&gt;
&lt;br&gt;
Thanks in advance&lt;br&gt;
George&lt;br&gt;
&lt;/div&gt;</description><author>geolast</author><pubDate>Mon, 02 Jun 2008 11:20:42 GMT</pubDate><guid isPermaLink="false">NEW POST: Get Text and Anchor Text? 20080602112042A</guid></item><item><title>COMMENTED ISSUE: "End tag &lt;/script&gt; was not found" when using short notation for tag closing</title><link>http://www.codeplex.com/htmlagilitypack/WorkItem/View.aspx?WorkItemId=16045</link><description>Here is a sample HTML file to illustrate the problem&amp;#58;&lt;br /&gt;&lt;br /&gt;&amp;#60;html&amp;#62;&lt;br /&gt;&amp;#60;head&amp;#62;&lt;br /&gt;&amp;#60;meta http-equiv&amp;#61;&amp;#34;Content-Type&amp;#34; content&amp;#61;&amp;#34;text&amp;#47;html&amp;#59; charset&amp;#61;utf-8&amp;#34; &amp;#47;&amp;#62;&lt;br /&gt;&amp;#60;script type&amp;#61;&amp;#34;text&amp;#47;javascript&amp;#34; src&amp;#61;&amp;#34;..&amp;#47;scripts&amp;#47;general.js&amp;#34; &amp;#47;&amp;#62;&amp;#60;title&amp;#62;T24&amp;#60;&amp;#47;title&amp;#62;&lt;br /&gt;&amp;#60;&amp;#47;head&amp;#62;&lt;br /&gt;&amp;#60;form name&amp;#61;&amp;#34;compositeScreenData&amp;#34;&amp;#62;&amp;#60;input type&amp;#61;&amp;#34;hidden&amp;#34; name&amp;#61;&amp;#34;someName&amp;#34; value&amp;#61;&amp;#34;0&amp;#34; &amp;#62;&amp;#60;&amp;#47;form&amp;#62;&lt;br /&gt;&amp;#60;&amp;#47;html&amp;#62;&lt;br /&gt;&lt;br /&gt;The parsing of the HTML does not work correctly when the short notation for tag closing is used &amp;#40;i.e.  &amp;#47;&amp;#62;  instead of  &amp;#60;&amp;#47;script&amp;#62; &amp;#41; &amp;#58;&lt;br /&gt;&lt;br /&gt;Doesn&amp;#39;t work&amp;#58; &amp;#60;script type&amp;#61;&amp;#34;text&amp;#47;javascript&amp;#34; src&amp;#61;&amp;#34;..&amp;#47;scripts&amp;#47;general.js&amp;#34; &amp;#47;&amp;#62;&lt;br /&gt;Works&amp;#58;            &amp;#60;script type&amp;#61;&amp;#34;text&amp;#47;javascript&amp;#34; src&amp;#61;&amp;#34;..&amp;#47;scripts&amp;#47;general.js&amp;#34;&amp;#62;&amp;#60;&amp;#47;script&amp;#62;&lt;br /&gt;&lt;br /&gt;When the parser is in state ParseState.PcData &amp;#40;used for CDATA elements like script, style, noxhtml &amp;#40;see HtmlNode.cs&amp;#41; &amp;#41; then it always expects the  verbose tag closing  &amp;#40;i.e. &amp;#60;&amp;#47;tag&amp;#62;&amp;#41;&lt;br /&gt;&lt;br /&gt;How to fix&amp;#58; &lt;br /&gt;a&amp;#41; Removing ElementsFlags.Add&amp;#40;&amp;#34;script&amp;#34;, HtmlElementFlag.CData&amp;#41;&amp;#59; in HtmlNode.cs might be an easy workaround, but it might have some side effects. &lt;br /&gt;b&amp;#41; Implementing a proper fix in the Parse&amp;#40;&amp;#41; method&lt;br /&gt;Comments: ** Comment from web user: simonm ** &lt;p&gt;Note that some browsers don&amp;#39;t support the empty notation &amp;#40;&amp;#60;script ... &amp;#47;&amp;#62;&amp;#41; anyway.&lt;/p&gt;</description><author>simonm</author><pubDate>Mon, 02 Jun 2008 06:13:59 GMT</pubDate><guid isPermaLink="false">COMMENTED ISSUE: "End tag &lt;/script&gt; was not found" when using short notation for tag closing 20080602061359A</guid></item><item><title>COMMENTED ISSUE: Unclosed paragraph lost</title><link>http://www.codeplex.com/htmlagilitypack/WorkItem/View.aspx?WorkItemId=12418</link><description>When inputting this html&amp;#58;&lt;br /&gt;&lt;br /&gt;&amp;#60;p&amp;#62;Paragraph1&amp;#60;p&amp;#62;Paragraph2&amp;#60;&amp;#47;p&amp;#62;&lt;br /&gt;&lt;br /&gt;The InnerHtml property of DocumentNode will contain&amp;#58;&lt;br /&gt;&lt;br /&gt;&amp;#60;p &amp;#47;&amp;#62;Paragraph 1&amp;#60;p&amp;#62;Paragraph 2&amp;#60;&amp;#47;p&amp;#62;&lt;br /&gt;&lt;br /&gt;This essentially means the first paragraph is lost. I don&amp;#39;t know if this is a rule or just a suggestion of W3C, but iirc they state that a &amp;#60;p&amp;#62; with no closing tag should be implicitly closed by the next block-level tag. In other words, the second &amp;#60;p&amp;#62; should close the first in this example, which means I&amp;#39;d the InnerHtml to be something like this&amp;#58;&lt;br /&gt;&lt;br /&gt;&amp;#60;p&amp;#62;Paragraph1&amp;#60;&amp;#47;p&amp;#62;&amp;#60;p&amp;#62;Paragraph2&amp;#60;&amp;#47;p&amp;#62;&lt;br /&gt;Comments: ** Comment from web user: simonm ** &lt;p&gt;This was a deliberate choice because&amp;#58;&lt;/p&gt;&lt;p&gt;1&amp;#41; it&amp;#39;s easier to implement &amp;#58;-&amp;#41;&lt;br /&gt;2&amp;#41; its renderered properly in IE and FF &amp;#40;AFAIK, but someone can correct me if I am wrong&amp;#41;&lt;br /&gt;3&amp;#41; it&amp;#39;s correct for an XML or XHTML parser&lt;/p&gt;&lt;p&gt;Note the HTML agility pack was never designed to be compliant with W3C specs, on purpose, but to minimise the changes in the original HTML, with a final rendering as close as possible to the original HTML in known browsers.&lt;/p&gt;</description><author>simonm</author><pubDate>Mon, 02 Jun 2008 06:12:10 GMT</pubDate><guid isPermaLink="false">COMMENTED ISSUE: Unclosed paragraph lost 20080602061210A</guid></item><item><title>NEW POST: Stripping harmful HTML from user input, but allowing other HTML?</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=24346</link><description>&lt;div style="line-height: normal;"&gt;Cool - thanks for that.&lt;br&gt;
&lt;/div&gt;</description><author>frogbody</author><pubDate>Thu, 29 May 2008 20:55:39 GMT</pubDate><guid isPermaLink="false">NEW POST: Stripping harmful HTML from user input, but allowing other HTML? 20080529085539P</guid></item><item><title>NEW POST: html agility pack with IE automation</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=28666</link><description>&lt;div style="line-height: normal;"&gt;Hi,&lt;br&gt;
&lt;br&gt;
I just learned how to automate the internet explorer browser through Microsoft.mshtml and SHDocVw. Now my next step is to parse the html document held by the browser with agility pack. Does anybody know where to start ? Thanks
&lt;/div&gt;</description><author>annddrew</author><pubDate>Thu, 29 May 2008 20:33:54 GMT</pubDate><guid isPermaLink="false">NEW POST: html agility pack with IE automation 20080529083354P</guid></item><item><title>NEW POST: Stripping harmful HTML from user input, but allowing other HTML?</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=24346</link><description>&lt;div style="line-height: normal;"&gt;I needed to do the same thing, but couldn't find any example code, so here's mine - it's not perfect, but it works well enough for my purposes...&lt;br&gt;
&lt;span style="font-family:courier new"&gt;&lt;br&gt;
&lt;br&gt;
&lt;hr&gt;
&lt;br&gt;
public string ScrubHTML(string html)&lt;br&gt;
{&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlDocument doc = new HtmlDocument();&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; doc.LoadHtml(html);&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; //Remove potentially harmful elements&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; HtmlNodeCollection nc = doc.DocumentNode.SelectNodes(&amp;quot;//script|//link|//iframe|//frameset|//frame|//applet|//object&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; if (nc != null)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; foreach (HtmlNode node in nc)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.ParentNode.RemoveChild(node, false);&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; //remove hrefs to java/j/vbscript URLs&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; nc = doc.DocumentNode.SelectNodes(&amp;quot;//a[starts-with(@href, 'javascript')]|//a[starts-with(@href, 'jscript')]|//a[starts-with(@href, 'vbscript')]&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; if (nc != null)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; foreach (HtmlNode node in nc)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.SetAttributeValue(&amp;quot;href&amp;quot;, &amp;quot;protected&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; //remove img with refs to java/j/vbscript URLs&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; nc = doc.DocumentNode.SelectNodes(&amp;quot;//img[starts-with(@src, 'javascript')]|//img[starts-with(@src, 'jscript')]|//img[starts-with(@src, 'vbscript')]&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; if (nc != null)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; foreach (HtmlNode node in nc)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.SetAttributeValue(&amp;quot;src&amp;quot;, &amp;quot;protected&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; //remove on&amp;lt;Event&amp;gt; handlers from all tags&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; nc = doc.DocumentNode.SelectNodes(&amp;quot;//*[@onclick or @onmouseover or @onfocus or @onblur or @onmouseout or @ondoubleclick or @onload or @onunload]&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; if (nc != null)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; foreach (HtmlNode node in nc)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onFocus&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onBlur&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onClick&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onMouseOver&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onMouseOut&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onDoubleClick&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onLoad&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; node.Attributes.Remove(&amp;quot;onUnload&amp;quot;);&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp; return doc.DocumentNode.WriteTo();&lt;br&gt;
} &lt;br&gt;
&lt;br&gt;
&lt;/span&gt;
&lt;/div&gt;</description><author>mg_45</author><pubDate>Thu, 29 May 2008 15:42:23 GMT</pubDate><guid isPermaLink="false">NEW POST: Stripping harmful HTML from user input, but allowing other HTML? 20080529034223P</guid></item><item><title>NEW POST: Html Agility Pack to LINQ to XML Converter</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=27908</link><description>&lt;div style="line-height: normal;"&gt;You can get the patch form from the source code tab in the patches area. &lt;br&gt;
&lt;br&gt;
Alot of websites dont validate as XHTML because they're probably not. they're HTML, which isn't exactly XML. In addition to that, the parsers have evolved to be very lenient to malformed X/HTML. This allows browsers to open a wider range of sketchy files, but makes scraping harder without a browsers parser.&lt;br&gt;
&lt;br&gt;
Enter, htmlagility pack.&lt;br&gt;
&lt;br&gt;
This is a gem IMHO in the C# OSS world. It brings a very lenient html parser and offers a set of external format converters, XML being one of them.&lt;br&gt;
&lt;br&gt;
My patch simply uses the XML converter to stream data into the linq2xml XDocument parser. Very simple.&lt;br&gt;
Check out my post at, &lt;br&gt;
http://vijay.screamingpens.com/archive/2008/05/26/linq-amp-lambda-part-3-html-agility-pack-to-linq.aspx&lt;br&gt;
&lt;br&gt;
I chose to use linq because my team needed to do some scraping work tasks and weren't very profficient in XPath, but had enough linq skills to parse xml. I prefer linq instead of xpath because it's easier to read in my opinion. It may be a bit slower, but perf is rarely something I particularly care about until we stress test and performance benchmark our apps. That said, sometimes from the start we're sure performance is going to be an issue (like on embedded devices), and we take care to design for performance early and give ourselves enough time for optimization. But I digress.&lt;br&gt;
&lt;br&gt;
-CV&lt;br&gt;
&lt;/div&gt;</description><author>CVertex</author><pubDate>Wed, 28 May 2008 23:56:37 GMT</pubDate><guid isPermaLink="false">NEW POST: Html Agility Pack to LINQ to XML Converter 20080528115637P</guid></item><item><title>NEW POST: Html Agility Pack to LINQ to XML Converter</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=27908</link><description>&lt;div style="line-height: normal;"&gt;+1&lt;br&gt;
&lt;br&gt;
How can I get a copy of this patch?&lt;br&gt;
&lt;br&gt;
I've noticed that a lot of web sites won't validate as XHTML. Does this fix this?  If not, would setting up the HTML Agility Pack to directly support LINQ be a better idea?
&lt;/div&gt;</description><author>dscruggs</author><pubDate>Wed, 28 May 2008 21:07:40 GMT</pubDate><guid isPermaLink="false">NEW POST: Html Agility Pack to LINQ to XML Converter 20080528090740P</guid></item><item><title>NEW POST: nested lists and OptionFixNestedTags</title><link>http://www.codeplex.com/htmlagilitypack/Thread/View.aspx?ThreadId=27981</link><description>&lt;div style="line-height: normal;"&gt;Hi all,&lt;br&gt;
&lt;br&gt;
I'm loading a HTML file which looks like this:&lt;br&gt;
&amp;lt;html&amp;gt;&lt;br&gt;
&amp;lt;head&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;ul&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;a&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;ol&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;b&amp;lt;/li&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;c&amp;lt;/li&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;d&amp;lt;/li&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/ol&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/li&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/ul&amp;gt;&lt;br&gt;
&amp;lt;/body&amp;gt;&lt;br&gt;
&amp;lt;/html&amp;gt;&lt;br&gt;
&lt;br&gt;
After I've loaded it I look at the DocumentNode.OuterHtml, and it looks like this:&lt;br&gt;
&amp;lt;html&amp;gt;&lt;br&gt;
&amp;lt;head&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;ul&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;a&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;ol&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/ol&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;b&amp;lt;/li&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;c&amp;lt;/li&amp;gt;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;li&amp;gt;d&amp;lt;/li&amp;gt;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/ul&amp;gt;&lt;br&gt;
&lt;br&gt;
&amp;lt;/head&amp;gt;&amp;lt;/html&amp;gt;&lt;br&gt;
&lt;br&gt;
Had a look at the code, and noticed that loading a URL sets a property 'OptionFixNestedTags' to true, if I change this to false everything works.&lt;br&gt;
&lt;br&gt;
Is there a problem with HtmlAgilityPack, or am I missing a setting somewhere?&lt;br&gt;
&lt;br&gt;
Thanks,&lt;br&gt;
Russ&lt;br&gt;
&lt;br&gt;
&lt;/div&gt;</description><author>russau</author><pubDate>Mon, 19 May 2008 12:46:30 GMT</pubDate><guid isPermaLink="false">NEW POST: nested lists and OptionFixNestedTags 20080519124630P</guid></item><item><title>CREATED TASK: Add class diagram</title><link>http://www.codeplex.com/htmlagilitypack/WorkItem/View.aspx?WorkItemId=16652</link><description>As a piece of code documentation and a tool for further designing.&lt;br /&gt;&lt;br /&gt;Needs custom layout for achieving any purpose&lt;br /&gt;</description><author>Jessynoo</author><pubDate>Mon, 19 May 2008 04:10:38 GMT</pubDate><guid isPermaLink="false">CREATED TASK: Add class diagram 20080519041038A</guid></item></channel></rss>