Python implementation: How to get the tree structure of all XPaths in the website?

Question

Method One When trying to use Python to get a hierarchical tree of all xpaths in the website (https://startpagina.nl), I first tried to get the xpath of the branch using: /html/body:fromseleniumimportwebdriverurl='https://startpagina .nl'driver=webdriver.Firefox()driver.get(url)test=driver.fin

P粉127901279 · Answer

The total number of XPaths that select one or more elements is unlimited (for example, it will include paths like /a/b/../b/../b/../b) , but if you restrict yourself to paths of the form /a[i]/b[j]/c[k], then the number of paths equals the number of elements, and the "tree" of XPaths is the same as the original XML Tree isomorphism.

If you want different paths without numeric predicates, such as /a/b/c, /a/b/d, then the easiest way is probably to iterate over the XML document , get the path of each element (in this form) and eliminate duplicates. If you want a tree structure instead of a simple list of paths, use nested maps/dictionaries to build it.

The reason it complains about /html/body/ is that a legal XPath expression cannot contain the trailing /.

method one

Method Two

question

Expected output