Scrapy INSERT into 'new_table' only if no record exists in 'current table'
P粉122932466
P粉122932466 2024-03-29 19:36:16
0
1
345

I tried some website scraping. I successfully scraped the data from the current database table. But I want to insert "new_table" only if no record exists in "current table"

My code is (pipeline)

table = 'products'
table2 = 'new_products'`
def save(self, row): 

    
    cursor = self.cnx.cursor()
    cursor.execute("SELECT DISTINCT product_id FROM products;")
    old_ids = [row[0] for row in cursor.fetchall()]
    create_query = ("INSERT INTO " + self.table + 
        "(rowid, date, listing_id, product_id, product_name, price, url) "
        "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

    cursor.execute(create_query, row)
    lastRecordId = cursor.lastrowid

    self.cnx.commit()
    cursor.close()
    print("Item saved with ID: {}" . format(lastRecordId))

    if not product_id in old_ids:
        create_query = ("INSERT INTO " + self.table2 + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

It doesn't work properly and gives an error.

2022-05-06 12:26:57 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-06 12:26:57.575507',
 'listing_id': '0190199600119',
 'price': '4199.00',
 'product_id': '1209298',
 'product_name': 'APPLE 11" Magic Türkçe Q Klavye Siyah',
 'rowid': 456274953331128512,
 'url': 'https://www.mediamarkt.com.tr/tr/product/APPLE%2011%22%20Magic%20T%C3%BCrk%C3%A7e%20Q%20Klavye%20Siyah-1209298.html'}
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item
    self.save(dict(item))
  File "/root/teknosa/teknosa/pipelines.py", line 62, in save
    if not product_id in old_ids:
NameError: name 'product_id' is not defined
Saving item into db ...

I have unique product_id.

If there is no product_id in the current table, insert this product_id into "new_products"

How to make this?

Thanks.

Last edit: I'm getting this error.

2022-05-07 18:17:11 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-07 18:17:11.902622',
 'listing_id': '8713439219357',
 'price': '99.00',
 'product_id': '1175529',
 'product_name': 'TRUST 21935 NANGA USB 3.1 Kart Okuyucu',
 'rowid': -411152717288573423,
 'url': 'https://www.mediamarkt.com.tr/tr/product/TRUST%2021935%20NANGA%20USB%203.1%20Kart%20Okuyucu-1175529.html'}
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 523, in cmd_query
    self._cmysql.query(query,
_mysql_connector.MySQLInterfaceError: Duplicate entry '-411152717288573423' for key 'products.rowid'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item
    self.save(dict(item))
  File "/root/teknosa/teknosa/pipelines.py", line 69, in save
    cursor.execute(create_query, row)
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/cursor_cext.py", line 269, in execute
    result = self._cnx.cmd_query(stmt, raw=self._raw,
  File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 528, in cmd_query
    raise errors.get_mysql_exception(exc.errno, msg=exc.msg,
mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '-411152717288573423' for key 'products.rowid'

P粉122932466
P粉122932466

reply all(1)
P粉278379495

If you just want to insert if it doesn't exist, there is no need to do what you are doing. No need to select them all and see if the one you're looking for is there.

What you need is to create a unique index for produc_id in table 2

Then change the code to:

table = 'products'
table2 = 'new_products'`

def save(self, row):  
    create_query = ("INSERT INTO " + self.table + 
        "(rowid, date, listing_id, product_id, product_name, price, url) "
        "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

    cursor.execute(create_query, row)
    lastRecordId = cursor.lastrowid

    self.cnx.commit()
    print("Item saved with ID: {}" . format(lastRecordId))
    create_query = ("INSERT INTO " + self.table2 + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s) ON DUPLICATE KEY UPDATE product_id=product_id")
    cursor.execute(create_query, row)
    self.cnx.commit()

If you use ON DUPLICATE KEY, when it finds a duplicate row (already existing product_id), the system will try to update the product_id to the same product_id, so it will not take effect.

These commits can be deleted if autocommit=True is set.

edit

If, as you said in your comment, you need to insert into the new table only if it does not already exist in the table, you can change your code like this:

You need to change the variable name in the row old_ids = [row[0] for row incursor.fetchall()] because you are changing the value of the row parameter 2. Your problem lies in the if statement. The product_id variable does not exist and needs to be modified

table = 'products'
table2 = 'new_products'`

def save(self, row):     
    cursor = self.cnx.cursor()
    cursor.execute("SELECT DISTINCT product_id FROM products;")
    old_ids = [element[0] for element in cursor.fetchall()]
    create_query = ("INSERT INTO " + self.table + 
        "(rowid, date, listing_id, product_id, product_name, price, url) "
        "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

    cursor.execute(create_query, row)
    lastRecordId = cursor.lastrowid

    self.cnx.commit()
    cursor.close()
    print("Item saved with ID: {}" . format(lastRecordId))

 

   if not row['product_id'] in old_ids:
        create_query = ("INSERT INTO " + self.table2 + 
            "(rowid, date, listing_id, product_id, product_name, price, url) "
            "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!