.net - Get the avaliable XPaths of an Html page? -

March 15, 2011

i've taken , adapted this code of how retrieve xpath expressions of xml document.

i same using html page retrieve avaliable xpaths ( maybe htmldocument? ), possibly?

note: can accept native solution or else using htmlagilitypack library.

this xml method:

''' <summary> ''' gets xpath expressions of xml document. ''' </summary> ''' <param name="document">indicates xml document.</param> ''' <returns>list(of system.string).</returns> public function getxpaths(byval document xml.xmldocument) list(of string)      dim xpathlist new list(of string)      dim xpath string = string.empty      each child xml.xmlnode in document.childnodes          if child.nodetype = xml.xmlnodetype.element             getxpaths(child, xpathlist, xpath)         end if      next ' child      return xpathlist  end function  ''' <summary> ''' gets xpath expressions of xml node. ''' </summary> ''' <param name="node">indicates xml node.</param> ''' <param name="xpathlist">indicates byreffered xpath list <see cref="list(of string)"/>.</param> ''' <param name="xpath">indicates current xpath.</param> private sub getxpaths(byval node xml.xmlnode,                       byref xpathlist list(of string),                       optional byval xpath string = nothing)      xpath &= "/" & node.name      if not xpathlist.contains(xpath)         xpathlist.add(xpath)     end if      each child xml.xmlnode in node.childnodes          if child.nodetype = xml.xmlnodetype.element             getxpaths(child, xpathlist, xpath)         end if      next ' child  end sub

as far can see, htmlagilitypack has similar class structures xmldocument. believe can easiliy adapt current solution cope htmldocument, :

public function getxpaths(byval document htmldocument) list(of string)     dim xpathlist new list(of string)     dim xpath string = string.empty     each child htmlnode in document.documentnode.childnodes         if child.nodetype = htmlnodetype.element             getxpaths(child, xpathlist, xpath)         end if     next ' child'     return xpathlist end function  private sub getxpaths(byval node htmlnode,                   byref xpathlist list(of string),                   optional byval xpath string = nothing)     xpath &= "/" & node.name     if not xpathlist.contains(xpath)         xpathlist.add(xpath)     end if     each child htmlnode in node.childnodes         if child.nodetype = htmlnodetype.element             getxpaths(child, xpathlist, xpath)         end if     next ' child' end sub

worked fine when tested using html xml compliant. can't guarantee how far work against malformed html documents.

Search This Blog

O9

.net - Get the avaliable XPaths of an Html page? -

Comments

Post a Comment

Popular posts from this blog

java - How to specify maven bin in eclipse maven plugin? -

single sign on - Logging into Plone site with credentials passed through HTTP -

php - Why does AJAX not process login form? -