We all know that the mysql database can store a large amount of data, but do you know how data is stored in mysql?
Generally, there are two ways to save data to MySQL, synchronous mode and asynchronous mode.
Synchronization mode
Synchronization mode uses SQL statements to insert data into the database. However, it should be noted that Scrapy's parsing speed is much faster than MySQL's logging speed. When there is a large amount of parsing, MySQL's logging may be blocked.
import MySQLdbclass MysqlPipeline(object): def __init__(self): self.conn = MySQLdb.connect('127.0.0.1','root','root','article_spider',charset="utf8",use_unicode=True) self.cursor = self.conn.cursor() def process_item(self, item, spider): insert_sql = """ insert into jobbole_article(title,create_date,url,url_object_id) VALUES (%s,%s,%s,%s) """ self.cursor.execute(insert_sql,(item["title"],item["create_date"],item["url"],item["url_object_id"])) self.conn.commit()
Asynchronous mode
Using synchronous mode may cause blocking. We can use Twisted to turn MySQL's warehousing and parsing into asynchronous operations instead of Simple execute and commit synchronous operations.
Regarding the configuration of MySQL, we can configure the database directly in the configuration file:
MYSQL_HOST = "127.0.0.1" MYSQL_DBNAME = "article_spider" MYSQL_USER = "root"MYSQL_PASSWORD = "root"
In the configuration in settings, we obtain the settings object by defining from_settings in the pipeline, and we can directly obtain the settings configuration file value in .
Use the asynchronous container provided by Twisted to connect to MySQL:
import MySQLdb import MySQLdb.cursorsfrom twisted.enterprise import adbapi
Using adbapi can make some operations of mysqldb asynchronous operations
Use cursors to execute and submit SQL statements
Code part:
class MysqlTwistedPipline(object): def __init__(self,dbpool): self.dbpool = dbpool @classmethod def from_settings(cls,settings): dbparms = dict( host = settings["MYSQL_HOST"], db = settings["MYSQL_DBNAME"], user = settings["MYSQL_USER"], passwd = settings["MYSQL_PASSWORD"], charset = 'utf8', cursorclass = MySQLdb.cursors.DictCursor, use_unicode=True, ) dbpool = adbapi.ConnectionPool("MySQLdb",**dbparms) return cls(dbpool) def process_item(self, item, spider): #使用Twisted将mysql插入变成异步执行 #runInteraction可以将传入的函数变成异步的 query = self.dbpool.runInteraction(self.do_insert,item) #处理异常 query.addErrback(self.handle_error,item,spider) def handle_error(self,failure,item,spider): #处理异步插入的异常 print(failure) def do_insert(self,cursor,item): #会从dbpool取出cursor #执行具体的插入 insert_sql = """ insert into jobbole_article(title,create_date,url,url_object_id) VALUES (%s,%s,%s,%s) """ cursor.execute(insert_sql, (item["title"], item["create_date"], item["url"], item["url_object_id"])) #拿传进的cursor进行执行,并且自动完成commit操作
The above code part, except do_insert, can be reused.
The above is the detailed content of How is data stored in mysql?. For more information, please follow other related articles on the PHP Chinese website!