python - 如何判断rss是否更新过
大家讲道理
大家讲道理 2017-04-17 14:47:55
0
3
1456

最近在写一个python程序要把一些rss中的文章不断获取下来

但不知道怎么判断rss是否更新过,只获取那些更新出来的文章

目前的想法是对每一个rss存一个最新文章的时间,下一次把所有更新的文章获取下来,然后更新这个最新时间


对了还有另外一个问题,就是感觉rss中的文章数没有网页上多,貌似好几天才有新的,但网页上是每天都有的,是什么原因?

大家讲道理
大家讲道理

光阴似箭催人老,日月如移越少年。

reply all(3)
左手右手慢动作

Theoretically, RSS should return a last-modified or etag (atom) in the http header, which can be judged by this

In python’s feedparser, you can use it like this

import feedparser
d = feedparser.parse(rss_url)
d = feedparser.parse(rss_url, modified=d.modified, etag=d.etag)
d.status # 304
d.feed # {}

If there is no update, you will not get anything the second time

迷茫

Doesn’t RSS have a GUID? Save the latest GUID and make a judgment when crawling again. Whether or not RSS has been updated is the business of other people’s server programs and you can’t control it either

黄舟

lz, please give me this program code! The final topic is this. I would like to ask the poster for help. I have zero basic knowledge and how to complete this project quickly. Crab

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template