Adakah mungkin untuk kod melangkau lelaran dalam mengikis web? IndexError: indeks pop timbul di luar julat

Question

Jadi saya mempunyai kod yang mengalih keluar nama + harga mineral daripada 14 halaman (setakat ini) dan menyimpannya ke fail .txt. Saya mula-mula cuba menggunakan Halaman1 sahaja, kemudian saya ingin menambah lebih banyak halaman untuk mendapatkan lebih banyak data. Tetapi kemudian kod itu mengambil sesuatu yang tidak sepatutnya - nama/rentetan rawak. Saya tidak menjangka ia akan merebut yang itu, tetapi ia berjaya dan memberikan harga yang salah kepada yang ini! Ia berlaku selepas mineral mempunyai "nama yang tidak dijangka" ini dan kemudian keseluruhan senarai yang lain mempunyai harga yang salah. Lihat gambar di bawah: Jadi, kerana rentetan ini berbeza daripada rentetan lain

P粉391955763 · Answer

Anda boleh mencuba contoh seterusnya bersama penomboran

import requests
from bs4 import BeautifulSoup

for URL in range(0,100,25):
    headers = {"User-Agent": "Mozilla/5.0"}

    soup = BeautifulSoup(requests.get(f'https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First={URL}', headers=headers).text, "lxml")

    names = [ x.get_text(strip=True) for x in soup.select('table tr td font a')][:25]
    print(names)
    prices = [ x.get_text(strip=True) for x in soup.select('table tr td font:nth-child(3)')][:25]
    print(prices)

    # with open("Minerals.txt", "a+", encoding='utf-8') as file:
    #     for name, price in zip(names, prices):
    #             # print(f"{name}
{price}")
    #             # print("-" * 50)
    #             filename = str(name)+" "+str(price)+"
"
    #             split1 = filename.split(' / ')          
    #             cutted1 = split1.pop(0)
    #             split2 = cutted1.split(": ")
    #             try:
    #                 cutted2 = split2.pop(1)
    #             except IndexError:
    #                 continue
    #             two_prices = cutted2+" "+split1.pop(0)+"
"
    #             file.write(two_prices)

Output:

["NX51AH2:
'lepidolite' after Elbaite with Elbaite", "TH27AL9:
'Pearceite' with Calcite", "TFM69AN5:
'Stilbite'", 'SM90CEX:
Acanthite', 'TMA97AN5:
Acanthite', 'TB90AE8:
 Acanthite', 'TZ71AK9:
Acanthite', 'EC63G1:
Acanthite', 'MN56K9:
Acanthite', 'TF89AL3:
Acanthite (Se-bearing) with Polybasite (Se-bearing) and Calcite', 'TP66AJ8:
Acanthite (Se-bearing) with Pyrite', 'TY86AN2:
Acanthite after Polybasite', 'TA66AF6:
Acanthite with Calcite', 'JFD104AO2:
Acanthite with Calcite', 'TX36AL6:
Acanthite with Calcite', 'TA48AH1:
Acanthite with Chalcopyrite', 'EF89L9:
Acanthite with Pyrite and Calcite', 'TX89AN0:
Acanthite with Siderite and Proustite', 'EA56K0:
Acanthite with Silver', 'EC48K0:
Acanthite with Silver', '11AT12:
Acanthite, Calcite', '9EF89L9:
Acanthite, Pyrite, Calcite', 'SM75TDA:
Adamite', '2M14:
Adamite', '20MJX66:
Adamite']
['Price:€580 / US8 / ¥84010 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€450 / US4 / ¥65180 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€540 / US7 / 
¥78220 / AUD0', 'Price:€580 / US8 / ¥84010 / AUD0', 'Price:€85 / US / ¥12310 / AUD0', 'Price:€155 / US9 / ¥22450 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€1500 / US47 / ¥217290 / AUD10', 'Price:€1600 / US51 / ¥231770 / AUD60', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€1200 / US38 / ¥173830 / AUD50', 'Price:€290 / US9 / ¥42000 / AUD0', 'Price:€480 / US5 / ¥69530 / AUD0', 'Price:€4800 / US53 / ¥695320 / AUD00', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€290 / US9 / ¥42000 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€320 / US0 / ¥46350 / AUD0', 'Price:€75 / US / ¥10860 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD5']
['5TD76M9:
Adamite', 'MA54AE9:
Adamite (variety Cu-bearing adamite) with Calcite', 'EA11Y6:
Adamite (variety cuprian)', 'EB14Y6:
Adamite (variety cuprian)', 'MC11X8:
Adamite (variety cuprian) with Smithsonite', 'JRM10AN8:
Aegirine', 'MFA46AP3:
Aegirine with Zircon, Orthoclase and Quartz (variety smoky)', 'EM48AF8:
Alabandite with Calcite', 'MC92T6:
Alabandite with Calcite and Rhodochrosite', 'TF16AN1:
Alabandite with Rhodochrosite', 'TX17S1:
Alabandite with Rhodochrosite', 'TD34S1:
Alabandite with Rhodochrosite', '10TR46:
Almandine', 'HM90EJ:
Analcime', 'EFH36AP3:
Analcime with Natrolite, Rhodochrosite and Serandite', 'ELR67AP1:
Analcime with Quartz', 'EML88AP1:
Analcime with Quartz', 'TF87AF4:
Andorite', 'TR88AJ3:
Andorite', 'ND56AN0:
Andorite with Zinkenite', 'SM180NH:
Andradite (variety demantoid)', 'MT86AL3:
Andradite (variety demantoid) with Calcite', 'MA27AL7:
Andradite (variety demantoid) with Calcite', 'TC80TL:
Andradite (variety topazolite) with Clinochlore', 'TC85TE:
Andradite (variety topazolite) with Clinochlore']
['Price:€180 / US5 / ¥26070 / AUD0', 'Price:€840 / US6 / ¥121680 / AUD90', 'Price:€60 / US / ¥8690 / 
AUD', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€580 / US8 / ¥84010 / AUD0', 'Price:€1600 / US51 / ¥231770 / AUD68', 'Price:€2700 / US86 / ¥391120 / AUD60', 'Price:€740 / US3 / ¥107190 / AUD40', 'Price:€110 / US3 / ¥15930 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€920 / US9 / ¥133270 / AUD10', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€130 / US4 / ¥18830 / AUD0', 'Price:€260 / US8 / ¥37660 / AUD0', 'Price:€380 / US2 / ¥55040 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€390 / US2 / ¥56490 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€180 / US5 / ¥26070 / AUD0', 'Price:€1600 / US51 / ¥231770 / AUD60', 'Price:€2200 / US70 / ¥318690 / AUD90', 'Price:€80 / US / ¥11580 / AUD0', 'Price:€85 / US / ¥12310 / AUD0']
['T29NAK3:
Andradite (variety topazolite) with Clinochlore', 'TC85TV:
Andradite (variety topazolite) with Clinochlore', 'T89GH5:
Andradite (variety topazolite) with Clinochlore', 'TQ94Q0:
Andradite (variety topazolite) with Stilbite', 'SM140TFV:
Andradite on Microcline', 'HM140NG:
Andradite with Calcite', 'GM66R9:
Andradite with Clinochlore', 'SM70TYW:
Andradite with Epidote', 'TC290TVH:
Andradite with Epidote and Microcline', 'TKX11AO7:
Andradite with Microcline', 'TC2100TEJ:
Andradite with Microcline', 'TH16AN2:
Andradite with Microcline', 'TTX66AO7:
Andradite with Microcline', 'TC2150TJL:
Andradite with Microcline', 'TQ96AN2:
Andradite with Microcline', 'TF48AF2:
Anglesite', 'MA47AL4:
Anglesite with Galena', 'LQ88AE6:
Anglesite with Galena', 'ER90AL8:
Anglesite with Galena', 'TP70AE1:
Anglesite with Galena', 'N54NAL5:
Anglesite with Galena', 'GV96R9:
Anhydrite with Calcite and Pyrite', '11TV99:
Anhydrite, Calcite', 'MG26AL4:
Anorthoroselite with Calcite', 'XM260NFF:
Aragonite']
['Price:€240 / US7 / ¥34760 / AUD0', 'Price:€85 / US / ¥12310 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€980 / US11 / ¥141960 / AUD10', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€100 / US3 / ¥14480 / AUD0', 'Price:€110 / US3 / ¥15930 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€380 / US2 / ¥55040 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€360 / US1 / ¥52140 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€940 / US9 / ¥136160 / AUD50', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€60 / US / ¥8690 / AUD'] 
['XM295EAR:
Aragonite', 'ETE46AP2:
Aragonite', 'EXM26AP0:
Aragonite', 'EYB26AP0:
Aragonite', 'EXE56AP2:
Aragonite', 'ETF46AP0:
Aragonite', 'XM2160ERF:
Aragonite', 'EXM46AP0:
Aragonite', 'XM2190MEX:
Aragonite', 'XM2780EFT:
Aragonite', 'EHM93AO9:
Aragonite', 'TYB37AO8:
Aragonite (variety Cu-bearing aragonite)', 'SM99AM3:
Aragonite (variety cuprian)', '1M06:
Aragonite (variety flos ferri)', 'TG69AL3:
Aragonite (variety tarnowitzite)', 'MLC96AO2:
Aragonite on Calcite', 'MLE68AO2:
Aragonite on Calcite', 'MTB66AP3:
Aragonite with Quartz (variety hematoide)', 'MXF96AP3:
Aragonite with Quartz (variety hematoide)', 'MRR47AP3:
Aragonite with Quartz (variety hematoide)', 'MTR37AP3:
Aragonite with Quartz (variety hematoide)', 'JFD193AP3:
Arfvedsonite with Microcline', 'TFX76AO7:
Arsenopyrite with Calcite, Pyrite, Sphalerite and Rhodochrosite', 'NB37AL3:
Arsenopyrite with Muscovite', 'HM220NX:
Arsenopyrite with Muscovite']
['Price:€95 / US / ¥13760 / AUD6', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€150 / US4 / 
¥21720 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD6', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€190 / US6 / ¥27520 / AUD3', 'Price:€780 / US4 / ¥112990 / AUD03', 'Price:€880 / US8 / ¥127470 / AUD50', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€480 / US5 / ¥69530 / AUD0', 'Price:€100 / US3 / ¥14480 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€190 / US6 / ¥27520 / AUD0', 'Price:€360 / US1 
/ ¥52140 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD6', 'Price:€190 / US6 / ¥27520 / AUD3', 'Price:€230 / US7 / ¥33310 / AUD4', 'Price:€230 / US7 / ¥33310 / AUD4', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€170 / US5 / ¥24620 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0']

P粉677684876 · Answer

Anda hanya perlu menjadikan pemilih CSS lebih khusus supaya ia hanya mengenal pasti pautan yang berada terus di dalam elemen fon (bukan beberapa lapisan ke bawah):

soup.select("table tr td font>a")

Menambah syarat lanjut bahawa pautan menghala ke satu item dan bukannya pautan halaman seterusnya/sebelumnya di bahagian bawah halaman juga akan membantu:

soup.select("table tr td font>a[href*='CODE']")