Use a Python web crawler to see what movies are currently playing in theaters

Release: 2023-07-25 17:21:57
forward
1826 people have browsed it

/1 Foreword/

## Maoyan Movies is a platform for Taobao to jointly create the most comprehensive movie categories. Inform users as soon as possible when the latest movies will be online. Today I will teach you how to get the details of upcoming movies from Maoyan Movies.

Use a Python web crawler to see what movies are currently playing in theaters

#/2 Project Goal/

Get details of upcoming movies from Maoyan Movies.


##/3 Project preparation/ Software:

PyCharm

Required libraries:

requests

, lxmlrandomtime

Plug-in:Xpath

##The website is as follows:

https://maoyan.com/films?showType=2&offset={}
Copy after login

Click the button on the next page and observe the changes in the website as follows:

https://maoyan.com/films?showType=2&offset=30
https://maoyan.com/films?showType=2&offset=60
https://maoyan.com/films?showType=2&offset=90
Copy after login

When you click the next page, the page offset=() increases by 30 each time, so you can use {} to replace the transformed variable, and then use a for loop to traverse the URL. , to implement multiple URL requests.

#/4 Project Implementation/

1. Define a class class to inherit object, define the init method to inherit self, and the main function main to inherit self. Import the required libraries and URLs, the code is as follows.

import requests
from lxml import etree


import time
import random


class MaoyanSpider(object):
    def __init__(self):
      self.url = "https://maoyan.com/films?showType=2&offset={}"


    def main(self):
        pass


if __name__ == '__main__':
    spider = MaoyanSpider()
    spider.main()
Copy after login


2、随机产生UserAgent。

 for i in range(1, 50):
    # ua.random,一定要写在这里,每次请求都会随机选择。
        self.headers = {
            'User-Agent': ua.random,
        }
Copy after login


3、发送请求,获取页面响应。

def get_page(self, url):
  # random.choice一定要写在这里,每次请求都会随机选择
  res = requests.get(url, headers=self.headers)
  res.encoding = 'utf-8'
  html = res.text
  self.parse_page(html)
Copy after login


4、xpath解析一级页面数据,获取页面信息。

1)基准xpath节点对象列表。

 #  创建解析对象
parse_html = etree.HTML(html)
# 基准xpath节点对象列表
dd_list = parse_html.xpath('//dl[@class="movie-list"]//dd')
Copy after login


2)依次遍历每个节点对象,提取数据。

 for dd in dd_list:
    name = dd.xpath('.//div[@class="movie-hover-title"]//span[@class="name noscore"]/text()')[0].strip()
    star = dd.xpath('.//div[@class="movie-hover-info"]//div[@class="movie-hover-title"][3]/text()')[1].strip()
    type = dd.xpath('.//div[@class="movie-hover-info"]//div[@class="movie-hover-title"][2]/text()')[1].strip()
    dowld=dd.xpath('.//div[@class="movie-item-hover"]/a/@href')[0].strip()
    # print(movie_dict)
    movie = '''【即将上映】
Copy after login


5、定义movie,保存打印数据。

 movie = '''【即将上映】
            
电影名字: %s


主演:%s


类型:%s
详情链接:https://maoyan.com%s
=========================================================
                                   ''' % (name, star, type,dowld)
print( movie)
Copy after login


6、random.randint()方法,设置时间延时。

time.sleep(random.randint(1, 3))
Copy after login


7、调用方法,实现功能。

html = self.get_page(url)
self.parse_page(html)
Copy after login


/5 Effect display/

1. Click the green triangle to run the input Start page, end page.

Use a Python web crawler to see what movies are currently playing in theaters


2. After running the program, the result is displayed on the console, as shown below shown.

Use a Python web crawler to see what movies are currently playing in theaters


##3. Click the blue download link to view details online .

Use a Python web crawler to see what movies are currently playing in theaters


#/6 Summary/

1. It is not recommended to capture too much data, as it will easily cause load on the server. Just try it briefly.

2. This article is based on Python web crawler and uses the crawler library to crawl Maoyan movies.

The above is the detailed content of Use a Python web crawler to see what movies are currently playing in theaters. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:Go语言进阶学习
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!