java - 解析HTML，获取其中想要的信息

Question

遇到一个比较棘手的需求： 开发语言Java； 程序定时读取HR邮箱中从前程无忧，智联发过来的简历（已实现，获到简历的HTML）； 从简历HTML中解析获取想要的信息想（姓名，性别，电话，邮箱，工作经历，教育经历等等...

PHPz · Answer

It is better to use Jsoup to convert it into the corresponding Document object, which is more convenient when operating the corresponding elements.
jsoup API: http://www.open-open.com/jsoup/

怪我咯 · Answer

What about using regular expressions? Think more about possible matching formats and it should be OK.

PHP中文网 · Answer

It would be better to use regular capture

巴扎黑 · Answer

You can use jsoup

高洛峰 · Answer

You can use JSOUP. I have done something similar before. This is very convenient and can process various tags and so on.

PHP中文网 · Answer

Prefer jsoup.
jsoup has a select function, which is similar to the syntax of CSS selector. The API is simple and convenient than regular expressions.

伊谢尔伦 · Answer

1 Regular
2 HTML parsing library, it seems to be called "mithril" in Chinese