Home Backend Development Python Tutorial python basic tutorial project four news aggregation

python basic tutorial project four news aggregation

Apr 03, 2018 am 09:17 AM
python news project

This article mainly introduces the news aggregation of the python basic tutorial project in detail. It has a certain reference value. Interested friends can refer to the

"Python Basic Tutorial" book. The fourth exercise is news aggregation. A type of application that is rare nowadays, at least I have never used it, is also called Usenet. The main function of this program is to collect information from specified sources (here, Usenet newsgroups), and then save this information to specified destination files (two forms are used here: plain text and html files). The use of this program is somewhat similar to the current blog subscription tool or RSS subscriber.

First introduce the code, and then analyze it one by one:

from nntplib import NNTP
from time import strftime,time,localtime
from email import message_from_string
from urllib import urlopen
import textwrap
import re
day = 24*60*60
def wrap(string,max=70):
    '''
    '''
    return '\n'.join(textwrap.wrap(string)) + '\n'
class NewsAgent:
    '''
    '''
    def __init__(self):
        self.sources = []
        self.destinations = []
    def addSource(self,source):
        self.sources.append(source)
    def addDestination(self,dest):
        self.destinations.append(dest)
    def distribute(self):
        items = []
        for source in self.sources:
            items.extend(source.getItems())
        for dest in self.destinations:
            dest.receiveItems(items)
class NewsItem:
    def __init__(self,title,body):
        self.title = title
        self.body = body
class NNTPSource:
    def __init__(self,servername,group,window):
        self.servername = servername
        self.group = group
        self.window = window
    def getItems(self):
        start = localtime(time() - self.window*day)
        date = strftime('%y%m%d',start)
        hour = strftime('%H%M%S',start)
        server = NNTP(self.servername)
        ids = server.newnews(self.group,date,hour)[1]
        for id in ids:
            lines = server.article(id)[3]
            message = message_from_string('\n'.join(lines))
            title = message['subject']
            body = message.get_payload()
            if message.is_multipart():
                body = body[0]
            yield NewsItem(title,body)
        server.quit()
class SimpleWebSource:
    def __init__(self,url,titlePattern,bodyPattern):
        self.url = url
        self.titlePattern = re.compile(titlePattern)
        self.bodyPattern = re.compile(bodyPattern)
    def getItems(self):
        text = urlopen(self.url).read()
        titles = self.titlePattern.findall(text)
        bodies = self.bodyPattern.findall(text)
        for title.body in zip(titles,bodies):
            yield NewsItem(title,wrap(body))
class PlainDestination:
    def receiveItems(self,items):
        for item in items:
            print item.title
            print '-'*len(item.title)
            print item.body
class HTMLDestination:
    def __init__(self,filename):
        self.filename = filename
    def receiveItems(self,items):
        out = open(self.filename,'w')
        print >> out,'''
        <html>
        <head>
         <title>Today's News</title>
        </head>
        <body>
        <h1>Today's News</hi>
        '''
        print >> out, '<ul>'
        id = 0
        for item in items:
            id += 1
            print >> out, '<li><a href="#" rel="external nofollow" >%s</a></li>' % (id,item.title)
        print >> out, '</ul>'
        id = 0
        for item in items:
            id += 1
            print >> out, '<h2><a name="%i">%s</a></h2>' % (id,item.title)
            print >> out, '<pre>%s</pre>' % item.body
        print >> out, '''
        </body>
        </html>
        '''
def runDefaultSetup():
    agent = NewsAgent()
    bbc_url = 'http://news.bbc.co.uk/text_only.stm'
    bbc_title = r'(?s)a href="[^" rel="external nofollow" ]*">\s*<b>\s*(.*?)\s*</b>'
    bbc_body = r'(?s)</a>\s*<br/>\s*(.*?)\s*<'
    bbc = SimpleWebSource(bbc_url, bbc_title, bbc_body)
    agent.addSource(bbc)
    clpa_server = 'news2.neva.ru'
    clpa_group = 'alt.sex.telephone'
    clpa_window = 1
    clpa = NNTPSource(clpa_server,clpa_group,clpa_window)
    agent.addSource(clpa)
    agent.addDestination(PlainDestination())
    agent.addDestination(HTMLDestination('news.html'))
    agent.distribute()
if __name__ == '__main__':
    runDefaultSetup()
Copy after login

This program will first be analyzed as a whole. The key part is NewsAgent, which is used to store news sources, store target addresses, and then call the source servers (NNTPSource and SimpleWebSource) and the classes for writing news (PlainDestination and HTMLDestination) respectively. So it can be seen from here that NNTPSource is specially used to obtain information on the news server, and SimpleWebSource is used to obtain data on a url. The functions of PlainDestination and HTMLDestination are obvious. The former is used to output the obtained content to the terminal, and the latter is used to write data to the html file.

With these analyses, let’s look at the contents of the main program. The main program is to add information sources and output destination addresses to NewsAgent.

This is indeed a simple program, but this program uses layering.


The above is the detailed content of python basic tutorial project four news aggregation. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to download deepseek Xiaomi How to download deepseek Xiaomi Feb 19, 2025 pm 05:27 PM

How to download deepseek Xiaomi

What are the advantages and disadvantages of templating? What are the advantages and disadvantages of templating? May 08, 2024 pm 03:51 PM

What are the advantages and disadvantages of templating?

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers Google AI announces Gemini 1.5 Pro and Gemma 2 for developers Jul 01, 2024 am 07:22 AM

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step May 06, 2024 pm 03:52 PM

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step

Share several .NET open source AI and LLM related project frameworks Share several .NET open source AI and LLM related project frameworks May 06, 2024 pm 04:43 PM

Share several .NET open source AI and LLM related project frameworks

How do you ask him deepseek How do you ask him deepseek Feb 19, 2025 pm 04:42 PM

How do you ask him deepseek

What software is NET40? What software is NET40? May 10, 2024 am 01:12 AM

What software is NET40?

How to save the evaluate function How to save the evaluate function May 07, 2024 am 01:09 AM

How to save the evaluate function

See all articles