python basic tutorial project four news aggregation
Apr 03, 2018 am 09:17 AMThis article mainly introduces the news aggregation of the python basic tutorial project in detail. It has a certain reference value. Interested friends can refer to the
"Python Basic Tutorial" book. The fourth exercise is news aggregation. A type of application that is rare nowadays, at least I have never used it, is also called Usenet. The main function of this program is to collect information from specified sources (here, Usenet newsgroups), and then save this information to specified destination files (two forms are used here: plain text and html files). The use of this program is somewhat similar to the current blog subscription tool or RSS subscriber.
First introduce the code, and then analyze it one by one:
from nntplib import NNTP from time import strftime,time,localtime from email import message_from_string from urllib import urlopen import textwrap import re day = 24*60*60 def wrap(string,max=70): ''' ''' return '\n'.join(textwrap.wrap(string)) + '\n' class NewsAgent: ''' ''' def __init__(self): self.sources = [] self.destinations = [] def addSource(self,source): self.sources.append(source) def addDestination(self,dest): self.destinations.append(dest) def distribute(self): items = [] for source in self.sources: items.extend(source.getItems()) for dest in self.destinations: dest.receiveItems(items) class NewsItem: def __init__(self,title,body): self.title = title self.body = body class NNTPSource: def __init__(self,servername,group,window): self.servername = servername self.group = group self.window = window def getItems(self): start = localtime(time() - self.window*day) date = strftime('%y%m%d',start) hour = strftime('%H%M%S',start) server = NNTP(self.servername) ids = server.newnews(self.group,date,hour)[1] for id in ids: lines = server.article(id)[3] message = message_from_string('\n'.join(lines)) title = message['subject'] body = message.get_payload() if message.is_multipart(): body = body[0] yield NewsItem(title,body) server.quit() class SimpleWebSource: def __init__(self,url,titlePattern,bodyPattern): self.url = url self.titlePattern = re.compile(titlePattern) self.bodyPattern = re.compile(bodyPattern) def getItems(self): text = urlopen(self.url).read() titles = self.titlePattern.findall(text) bodies = self.bodyPattern.findall(text) for title.body in zip(titles,bodies): yield NewsItem(title,wrap(body)) class PlainDestination: def receiveItems(self,items): for item in items: print item.title print '-'*len(item.title) print item.body class HTMLDestination: def __init__(self,filename): self.filename = filename def receiveItems(self,items): out = open(self.filename,'w') print >> out,''' <html> <head> <title>Today's News</title> </head> <body> <h1>Today's News</hi> ''' print >> out, '<ul>' id = 0 for item in items: id += 1 print >> out, '<li><a href="#" rel="external nofollow" >%s</a></li>' % (id,item.title) print >> out, '</ul>' id = 0 for item in items: id += 1 print >> out, '<h2><a name="%i">%s</a></h2>' % (id,item.title) print >> out, '<pre>%s</pre>' % item.body print >> out, ''' </body> </html> ''' def runDefaultSetup(): agent = NewsAgent() bbc_url = 'http://news.bbc.co.uk/text_only.stm' bbc_title = r'(?s)a href="[^" rel="external nofollow" ]*">\s*<b>\s*(.*?)\s*</b>' bbc_body = r'(?s)</a>\s*<br/>\s*(.*?)\s*<' bbc = SimpleWebSource(bbc_url, bbc_title, bbc_body) agent.addSource(bbc) clpa_server = 'news2.neva.ru' clpa_group = 'alt.sex.telephone' clpa_window = 1 clpa = NNTPSource(clpa_server,clpa_group,clpa_window) agent.addSource(clpa) agent.addDestination(PlainDestination()) agent.addDestination(HTMLDestination('news.html')) agent.distribute() if __name__ == '__main__': runDefaultSetup()
This program will first be analyzed as a whole. The key part is NewsAgent, which is used to store news sources, store target addresses, and then call the source servers (NNTPSource and SimpleWebSource) and the classes for writing news (PlainDestination and HTMLDestination) respectively. So it can be seen from here that NNTPSource is specially used to obtain information on the news server, and SimpleWebSource is used to obtain data on a url. The functions of PlainDestination and HTMLDestination are obvious. The former is used to output the obtained content to the terminal, and the latter is used to write data to the html file.
With these analyses, let’s look at the contents of the main program. The main program is to add information sources and output destination addresses to NewsAgent.
This is indeed a simple program, but this program uses layering.
The above is the detailed content of python basic tutorial project four news aggregation. For more information, please follow other related articles on the PHP Chinese website!

Hot Article

Hot tools Tags

Hot Article

Hot Article Tags

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

What are the advantages and disadvantages of templating?

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step

Share several .NET open source AI and LLM related project frameworks
