我嘗試了一些網站抓取。我成功地抓取了目前資料庫表中的資料。但我想僅當“當前表”中不存在記錄時才插入“new_table”
我的程式碼是(管道)
table = 'products' table2 = 'new_products'` def save(self, row): cursor = self.cnx.cursor() cursor.execute("SELECT DISTINCT product_id FROM products;") old_ids = [row[0] for row in cursor.fetchall()] create_query = ("INSERT INTO " + self.table + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)") cursor.execute(create_query, row) lastRecordId = cursor.lastrowid self.cnx.commit() cursor.close() print("Item saved with ID: {}" . format(lastRecordId)) if not product_id in old_ids: create_query = ("INSERT INTO " + self.table2 + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
它無法正常工作,並且出現錯誤。
2022-05-06 12:26:57 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-06 12:26:57.575507', 'listing_id': '0190199600119', 'price': '4199.00', 'product_id': '1209298', 'product_name': 'APPLE 11" Magic Türkçe Q Klavye Siyah', 'rowid': 456274953331128512, 'url': 'https://www.mediamarkt.com.tr/tr/product/APPLE%2011%22%20Magic%20T%C3%BCrk%C3%A7e%20Q%20Klavye%20Siyah-1209298.html'} Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item self.save(dict(item)) File "/root/teknosa/teknosa/pipelines.py", line 62, in save if not product_id in old_ids: NameError: name 'product_id' is not defined Saving item into db ...
我有唯一的product_id。
如果目前表中沒有product_id,則將此product_id插入到「new_products」
如何製作這個?
謝謝。
上次編輯:我收到此錯誤。
2022-05-07 18:17:11 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-07 18:17:11.902622', 'listing_id': '8713439219357', 'price': '99.00', 'product_id': '1175529', 'product_name': 'TRUST 21935 NANGA USB 3.1 Kart Okuyucu', 'rowid': -411152717288573423, 'url': 'https://www.mediamarkt.com.tr/tr/product/TRUST%2021935%20NANGA%20USB%203.1%20Kart%20Okuyucu-1175529.html'} Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 523, in cmd_query self._cmysql.query(query, _mysql_connector.MySQLInterfaceError: Duplicate entry '-411152717288573423' for key 'products.rowid' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item self.save(dict(item)) File "/root/teknosa/teknosa/pipelines.py", line 69, in save cursor.execute(create_query, row) File "/usr/local/lib/python3.8/dist-packages/mysql/connector/cursor_cext.py", line 269, in execute result = self._cnx.cmd_query(stmt, raw=self._raw, File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 528, in cmd_query raise errors.get_mysql_exception(exc.errno, msg=exc.msg, mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '-411152717288573423' for key 'products.rowid'
如果您只想在不存在的情況下插入,則無需執行您正在執行的操作。無需全選然後查看您要查找的那個是否存在。
您需要的是為表2中的produc_id建立一個唯一索引
然後將程式碼更改為:
如果您使用ON DUPLICATE KEY,當它發現重複行(已存在的product_id)時,系統會嘗試將product_id更新為相同的product_id,因此不會生效。
如果設定 autocommit= True,則可以刪除這些提交。
編輯
如果正如您在評論中所說,僅當表中不存在時才需要插入新表,您可以像這樣更改程式碼:
您需要更改行 old_ids = [row[0] for row incursor.fetchall()] 中的變數名稱,因為您正在變更
row
參數的值 2.你的問題出在if語句中,product_id變數不存在,需要修改