Community

Learn

Tools Library

AI Tools

Leisure

English

Home

Backend Development

Python Tutorial

Python使用scrapy抓取网站sitemap信息的方法

Python使用scrapy抓取网站sitemap信息的方法

Jun 10, 2016 pm 03:15 PM

python scrapy

本文实例讲述了Python使用scrapy抓取网站sitemap信息的方法。分享给大家供大家参考。具体如下：

import re
from scrapy.spider import BaseSpider
from scrapy import log
from scrapy.utils.response import body_or_str
from scrapy.http import Request
from scrapy.selector import HtmlXPathSelector
class SitemapSpider(BaseSpider):
 name = "SitemapSpider"
 start_urls = ["http://www.domain.com/sitemap.xml"]
 def parse(self, response):
  nodename = 'loc'
  text = body_or_str(response)
  r = re.compile(r"(<%s[\s>])(.*&#63;)(</%s>)"%(nodename,nodename),re.DOTALL)
  for match in r.finditer(text):
   url = match.group(2)
   yield Request(url, callback=self.parse_page)
 def parse_page(self, response):
    hxs = HtmlXPathSelector(response)
    #Mock Item
  blah = Item()
  #Do all your page parsing and selecting the elemtents you want
    blash.divText = hxs.select('//div/text()').extract()[0]
  yield blah

Copy after login

希望本文所述对大家的Python程序设计有所帮助。

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article

Repo: How To Revive Teammates

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks ago By DDD

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Difficulty in updating caching of official account web pages: How to avoid the old cache affecting the user experience after version update?

3 weeks ago By 王林

Show More

Hot tools Tags

Code&IT

Voice

Business

Marketing

AI Detector

Chatbot

Design&Art

Hot Article

Repo: How To Revive Teammates

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks ago By DDD

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Difficulty in updating caching of official account web pages: How to avoid the old cache affecting the user experience after version update?

3 weeks ago By 王林

Show More

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Show More

Hot Topics

Where is the login entrance for gmail email?

7292

9

Java Tutorial

1622

14

CakePHP Tutorial

1342

46

Laravel Tutorial

1259

25

PHP Tutorial

1206

29

Show More

Related knowledge

How to download deepseek Xiaomi

How to download deepseek Xiaomi Feb 19, 2025 pm 05:27 PM

How to download deepseek Xiaomi

What are the advantages and disadvantages of templating?

What are the advantages and disadvantages of templating? May 08, 2024 pm 03:51 PM

What are the advantages and disadvantages of templating?

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers Jul 01, 2024 am 07:22 AM

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step May 06, 2024 pm 03:52 PM

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step

Share several .NET open source AI and LLM related project frameworks

Share several .NET open source AI and LLM related project frameworks May 06, 2024 pm 04:43 PM

Share several .NET open source AI and LLM related project frameworks

A complete guide to golang function debugging and analysis

A complete guide to golang function debugging and analysis May 06, 2024 pm 02:00 PM

A complete guide to golang function debugging and analysis

How do you ask him deepseek

How do you ask him deepseek Feb 19, 2025 pm 04:42 PM

How do you ask him deepseek

How to save the evaluate function

How to save the evaluate function May 07, 2024 am 01:09 AM

How to save the evaluate function

See all articles