python 正则表达式findall函数查找网页中所有的URL

Question

在做 python practice book 的习题，完成如下问题：Problem 8: Write a program links.py that takes URL of a webpage as argument and prints all the URLs linked from that webpage.要求使用 python 的 re 模块。遇到的问题：正则表达式 (src|hre

欧阳克 · Answer

正则表达式提取网页内容太麻烦，容易出错。推荐用beautifulsoup以及xpath

三叔 · Answer

findall得到的是(...)所匹配的部分; 建议这样正则修改为这样(src|href)\=(\.*?)\", 你能看到它会返回被括号括起来的匹配部分;