If it is a specific website, you can make some matches based on its pages.
It would be difficult to be compatible with all websites. Identification based solely on tag names is definitely not accurate. There should be algorithms such as neural networks and machine learning.
If it is a specific website, you can make some matches based on its pages.
It would be difficult to be compatible with all websites. Identification based solely on tag names is definitely not accurate. There should be algorithms such as neural networks and machine learning.
It is more convenient to use the cheerio module.
Example: http://www.focalhot.com/blog/62.html
Content themes can try using line block density
You can only find tags like h1-h3 for the title