Use Excel\VBA to scrape information from web pages
P粉555696738
P粉555696738 2023-09-02 18:48:32
0
1
883
<p>I'm trying to scrape data from a web page, I need 4 bits of data, 2 of which I've been able to retrieve, but the other 2 have been hinted to me. </p> <p>This is not a fixed URL, but it has been retrieved from IE after the redirect, so for this example I will use: https://cpc.farnell.com/warton/microprint-p2010-15-32um-250g/solder-paste-15-32um-250g-pot/dp/SD02808 </p> <p>I hope to get the title:</p> <p><strong>Microprint P2010 No-Clean, Lead-Free Solder Paste, 15-32um, 250g Jar - MICROPRINT P2010 15-32UM, 250G</strong></p> <p>and manufacturer part number</p> <p><strong>Microprint P2010 15-32UM, 250G</strong></p> <p>I've been using <strong>getElementbyID</strong> to get the product overview/product information and it works great, but the other text fields don't seem to work (I looked at the Xpath and it errors out on Object) VBE required)</p> <p>I successfully implemented the other 2 elements using: </p> <pre class="brush:php;toolbar:false;">Sub Mani() 'Declare variables Dim objWeb As Object Dim objHTML As Object Dim objElement As Object Dim strData, StrData1 As String Set objWeb = CreateObject("internetexplorer.Application") str = "https://cpc.farnell.com/" objWeb.navigate str & Cells(1, 1).Value While objWeb.Busy = True Wend FullURL = objWeb.LocationURL Range("b2").Value = FullURL objWeb.navigate FullURL 'IE.Visible = True While objWeb.Busy = True Wend Set objHTML = objWeb.document strData = objHTML.getElementById("pdpSection_FAndB").innerText ActiveSheet.Range("C3").Value = strData strData = objHTML.getElementById("pdpSection_pdpProdDetails").innerText Set strData = Nothing ActiveSheet.Range("D3").Value = strData objWeb.Quit End Sub</pre> <p>URL reloading is for navigation around URL redirects, </p> <p>https://cpc.farnell.com/SD02808</p> <p>Redirect to main URL</p> <p>https://cpc.farnell.com/warton/microprint-p2010-15-32um-250g/solder-paste-15-32um-250g-pot/dp/SD02808</p> <p>I want to enter the part number into A1, Excel use it as a variable in VB, and then re-read the full url from the IE instance. This is a legacy issue from trying to use <strong>Get Data from the Web</strong>, which didn't work well due to a page script error, so an instance of IE was used. </p>
P粉555696738
P粉555696738

reply all(1)
P粉517475670

IE is dead and should no longer be used. However, it is also possible to use the shortened URL https://cpc.farnell.com/SD02808 via xhr (XML HTTP request). The following code only shows how to read the required information from the loaded HTML code. All safeguards are missing.

The object error occurs in the code shown above because you are trying to set a variable of data type String to Nothing, just like an object variable using Set strData = none.

Sub GetCPCOfferInfos()

  Dim url As String
  Dim doc As Object
  
  Set doc = CreateObject("htmlFile")
  
  With CreateObject("MSXML2.XMLHTTP.6.0")
    url = "https://cpc.farnell.com/" & ActiveSheet.Cells(1, 1)
    .Open "GET", url, False
    .Send
    
    If .Status = 200 Then
      doc.body.innerHTML = .responseText
      
      ActiveSheet.Range("A3") = doc.getElementsByTagName("h1")(0).innertext _
                                & doc.getElementsByTagName("h2")(0).innertext 'Title
      ActiveSheet.Range("B3") = doc.getElementsByClassName("ManufacturerPartNumber")(0).innertext 'Manufacturer Part Number
      ActiveSheet.Range("C3") = doc.getElementById("pdpSection_FAndB").innertext 'Product Overview
      ActiveSheet.Range("D3") = doc.getElementById("pdpSection_pdpProdDetails").innertext 'Product Information
    Else
      MsgBox "Page not loaded. HTTP status " & .Status
    End If
  End With
End Sub
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template