.net - Get the avaliable XPaths of an Html page? -
i've taken , adapted this code of how retrieve xpath expressions of xml document.
i same using html page retrieve avaliable xpaths ( maybe htmldocument? ), possibly?
note: can accept native solution or else using htmlagilitypack library.
this xml method:
''' <summary> ''' gets xpath expressions of xml document. ''' </summary> ''' <param name="document">indicates xml document.</param> ''' <returns>list(of system.string).</returns> public function getxpaths(byval document xml.xmldocument) list(of string) dim xpathlist new list(of string) dim xpath string = string.empty each child xml.xmlnode in document.childnodes if child.nodetype = xml.xmlnodetype.element getxpaths(child, xpathlist, xpath) end if next ' child return xpathlist end function ''' <summary> ''' gets xpath expressions of xml node. ''' </summary> ''' <param name="node">indicates xml node.</param> ''' <param name="xpathlist">indicates byreffered xpath list <see cref="list(of string)"/>.</param> ''' <param name="xpath">indicates current xpath.</param> private sub getxpaths(byval node xml.xmlnode, byref xpathlist list(of string), optional byval xpath string = nothing) xpath &= "/" & node.name if not xpathlist.contains(xpath) xpathlist.add(xpath) end if each child xml.xmlnode in node.childnodes if child.nodetype = xml.xmlnodetype.element getxpaths(child, xpathlist, xpath) end if next ' child end sub
as far can see, htmlagilitypack has similar class structures xmldocument. believe can easiliy adapt current solution cope htmldocument, :
public function getxpaths(byval document htmldocument) list(of string) dim xpathlist new list(of string) dim xpath string = string.empty each child htmlnode in document.documentnode.childnodes if child.nodetype = htmlnodetype.element getxpaths(child, xpathlist, xpath) end if next ' child' return xpathlist end function private sub getxpaths(byval node htmlnode, byref xpathlist list(of string), optional byval xpath string = nothing) xpath &= "/" & node.name if not xpathlist.contains(xpath) xpathlist.add(xpath) end if each child htmlnode in node.childnodes if child.nodetype = htmlnodetype.element getxpaths(child, xpathlist, xpath) end if next ' child' end sub worked fine when tested using html xml compliant. can't guarantee how far work against malformed html documents.
Comments
Post a Comment