


How to use the scrapy framework to loop through Jingdong data and then import it into Mysql
This article mainly shares with you how to use the scrapy framework to loop through JD.com data and then import it into Mysql. It has a good reference value and I hope it will be helpful to everyone. Let’s follow the editor to take a look, I hope it can help everyone.
JD.com has an anti-crawling mechanism, so I used a user agent and pretended to be a browser.
The crawled data is the mobile phone information URL of JD Mall: https://list.jd.com/list.html?cat=9987,653,655&page= 1
There are about 9,000 pieces of data, and products that are not in the list are not included.
Problems encountered:
1. It is best to use the user agent method (use_proxy), because I wrote the code directly under parse before, and encountered the problem of not enough values to unpack. I really didn’t know which sentence the error was in, so I printed after each sentence of code and found the problem. The problem was with urlopen(), but I tried again and again and checked the Internet, but I couldn't find the error. I solved it by writing a method. Now I think it may be because the parse method handles respose.
2. Before importing the data into mysql, I first tried to import the data into the file, but during the import, I found that the size of x.txt was always 0kb. 1kb is changing, but not growing. Thinking about it, it should be overwritten. I originally thought that I wrote fh.close() in the wrong position, but then I suddenly thought
##fh = open( "D:/pythonlianxi/result/4.txt", "w") is wrong, you should change 'w' to 'a'.
#3. Import the database. The main problem encountered is the Chinese encoding problem. You must first open mysql, show variables like '%char%'; check the character set encoding of the database. Format, use the corresponding form. For example, I use utf8, but it is not easy to use gbk. In addition, don't forget charset='utf8' when writing to connect to mysql.
The following is the specific code:
<span style='font-family: 微软雅黑, "Microsoft YaHei"; font-size: 16px;'>conn = pymysql.connect(host="127.0.0.1", user="root", passwd="root", db="jingdong", charset="utf8")<br></span>
<span style='font-family: 微软雅黑, "Microsoft YaHei"; font-size: 16px;'>import scrapy<br>from scrapy.http import Request<br>from jingdong.items import JingdongItem<br>import re<br>import urllib.error<br>import urllib.request<br>import pymysql<br>class JdSpider(scrapy.Spider):<br> name = 'jd' <br> allowed_domains = ['jd.com'] <br> #start_urls = ['http://jd.com/'] <br> header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"} <br> #fh = open("D:/pythonlianxi/result/4.txt", "w") <br> def start_requests(self): <br> return [Request("https://list.jd.com/list.html?cat=9987,653,655&page=1",callback=self.parse,headers=self.header,meta={"cookiejar":1})] <br> def use_proxy(self,proxy_addr,url): <br> try:<br> req=urllib.request.Request(url)<br> req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36")<br> proxy = urllib.request.ProxyHandler({"http": proxy_addr})<br> opener = urllib.request.build_opener(proxy, urllib.request.HTTPHandler)<br> urllib.request.install_opener(opener)<br> data=urllib.request.urlopen(req).read().decode("utf-8","ignore") <br> return data <br> except urllib.error.URLError as e: <br> if hasattr(e,"code"): <br> print(e.code) <br> if hasattr(e,"reason"): <br> print(e.reason) <br> except Exception as e: <br> print(str(e)) <br> <br> def parse(self, response):<br> item=JingdongItem()<br> proxy_addr = "61.135.217.7:80" <br> try:<br> item["title"]=response.xpath("//p[@class='p-name']/a[@target='_blank']/em/text()").extract()<br> item["pricesku"] =response.xpath("//li[@class='gl-item']/p/@data-sku").extract() <br> <br> for j in range(2,166):<br> url="https://list.jd.com/list.html?cat=9987,653,655&page="+str(j) <br> print(j) <br> #yield item <br> yield Request(url)<br> pricepat = '"p":"(.*?)"' <br> personpat = '"CommentCountStr":"(.*?)",' <br> print("2k") <br> #fh = open("D:/pythonlianxi/result/5.txt", "a") <br> conn = pymysql.connect(host="127.0.0.1", user="root", passwd="root", db="jingdong", charset="utf8") <br> <br> for i in range(0,len(item["pricesku"])):<br> priceurl="https://p.3.cn/prices/mgets?&ext=11000000&pin=&type=1&area=1_72_4137_0&skuIds="+item["pricesku"][i]<br> personurl = "https://club.jd.com/comment/productCommentSummaries.action?referenceIds=" + item["pricesku"][i]<br> pricedata=self.use_proxy(proxy_addr,priceurl)<br> price=re.compile(pricepat).findall(pricedata)<br> persondata = self.use_proxy(proxy_addr,personurl)<br> person = re.compile(personpat).findall(persondata)<br> <br> title=item["title"][i] <br> print(title)<br> price1=float(price[0]) <br> #print(price1) <br> person1=person[0]<br> #fh.write(tile+"\n"+price+"\n"+person+"\n") <br> cursor = conn.cursor()<br> sql = "insert into jd(title,price,person) values(%s,%s,%s);" <br> params=(title,price1,person1) <br> print("4")<br> cursor.execute(sql,params)<br> conn.commit() <br> <br> #fh.close()<br></span>
<span style='font-family: 微软雅黑, "Microsoft YaHei"; font-size: 16px;'> conn.close() <br> return item <br> except Exception as e: <br> print(str(e))</span><span style='font-family: 微软雅黑, "Microsoft YaHei";'><br></span>
I believe you are smart and have learned it , what are you waiting for, hurry up and practice it.
The above is the detailed content of How to use the scrapy framework to loop through Jingdong data and then import it into Mysql. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



MySQL is suitable for beginners because it is simple to install, powerful and easy to manage data. 1. Simple installation and configuration, suitable for a variety of operating systems. 2. Support basic operations such as creating databases and tables, inserting, querying, updating and deleting data. 3. Provide advanced functions such as JOIN operations and subqueries. 4. Performance can be improved through indexing, query optimization and table partitioning. 5. Support backup, recovery and security measures to ensure data security and consistency.

You can open phpMyAdmin through the following steps: 1. Log in to the website control panel; 2. Find and click the phpMyAdmin icon; 3. Enter MySQL credentials; 4. Click "Login".

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

Create a database using Navicat Premium: Connect to the database server and enter the connection parameters. Right-click on the server and select Create Database. Enter the name of the new database and the specified character set and collation. Connect to the new database and create the table in the Object Browser. Right-click on the table and select Insert Data to insert the data.

MySQL and SQL are essential skills for developers. 1.MySQL is an open source relational database management system, and SQL is the standard language used to manage and operate databases. 2.MySQL supports multiple storage engines through efficient data storage and retrieval functions, and SQL completes complex data operations through simple statements. 3. Examples of usage include basic queries and advanced queries, such as filtering and sorting by condition. 4. Common errors include syntax errors and performance issues, which can be optimized by checking SQL statements and using EXPLAIN commands. 5. Performance optimization techniques include using indexes, avoiding full table scanning, optimizing JOIN operations and improving code readability.

You can create a new MySQL connection in Navicat by following the steps: Open the application and select New Connection (Ctrl N). Select "MySQL" as the connection type. Enter the hostname/IP address, port, username, and password. (Optional) Configure advanced options. Save the connection and enter the connection name.

Recovering deleted rows directly from the database is usually impossible unless there is a backup or transaction rollback mechanism. Key point: Transaction rollback: Execute ROLLBACK before the transaction is committed to recover data. Backup: Regular backup of the database can be used to quickly restore data. Database snapshot: You can create a read-only copy of the database and restore the data after the data is deleted accidentally. Use DELETE statement with caution: Check the conditions carefully to avoid accidentally deleting data. Use the WHERE clause: explicitly specify the data to be deleted. Use the test environment: Test before performing a DELETE operation.

Steps to perform SQL in Navicat: Connect to the database. Create a SQL Editor window. Write SQL queries or scripts. Click the Run button to execute a query or script. View the results (if the query is executed).
