Scrape Google Search Results Using Python-Python Tutorial-php.cn

Scrape Google Search Results Using Python

Google holds an immense volume of data for businesses and researchers. It performs over 8.5 billion daily searches and commands a 91% share of the global search engine market.

Since the debut of ChatGPT, Google data has been utilized not only for traditional purposes like rank tracking, competitor monitoring, and lead generation but also for developing advanced LLM models, training AI models, and enhancing the capabilities of Natural Language Processing (NLP) models.

Scraping Google, however, is not easy for everyone. It requires a team of professionals and a robust infrastructure to scrape at scale.

Scrape Google Search Results Using Python

In this article, we will learn to scrape Google Search Results using Python and BeautifulSoup. This will enable you to build your own tools and models that are capable of leveraging Google’s data at scale.

Let’s get started!

What are Google Search Results?

Google Search Results are the listings that appear on Google based on the user query entered in the search bar. Google heavily utilizes NLP to understand these queries and present users with relevant results. These results often include featured snippets in addition to organic results, such as the latest AI overviews, People Also Ask sections, Related Searches, and Knowledge Graphs. These elements provide summarized and related information to users based on their queries.

Applications Of Scraping Google Search Data

Google Search Data has various applications:

Building a rank and keyword tracker for SEO purposes.
Searching for local businesses.
Building LLM engines.
Discovering exploding topics for potential trends in the future.

Why Python for scraping Google?

Python is a versatile and robust language that provides a powerful HTTP handshake configuration for scraping websites that other languages may struggle with or have lower success rates. As the popularity of AI models trained on web-scraped data grows, Python’s relevance in web-scraping topics continues to rise within the developer community.

Additionally, beginners looking to learn Python as a web scraping skill can understand it easily due to its simple syntax and code clarity. Plus, it has huge community support on platforms like Discord, Reddit, etc., which can help with any level of problem you are facing.

This scalable language excels in web scraping performance and provides powerful frameworks like Scrapy, Requests, and BeautifulSoup, making it a superior choice for scraping Google and other websites compared to other languages.

Scraping Google Search Results With Python

This section will teach us to create a basic Python script to retrieve the first 10 Google search results.

Requirements

To follow this tutorial we need to install the following libraries:

Requests — To pull HTML data from the Google Search URL.
BeautifulSoup — To refine HTML data in a structured format.

Setup

The setup is simple. Create a Python file and install the required libraries to get started.

Run the following commands in your project folder:

    touch scraper.py

Copy after login

And then install the libraries.

    pip install requests
    pip install beautifulsoup4

Copy after login

Process

We are done with the setup and have all the stuff to move forward. We will use the Requests library in Python to extract the raw HTML and the BeautifulSoup to refine it and get the desired information.

But what is “desired information” here?

Scrape Google Search Results Using Python

The filtered data would contain this information:

Title
Link
Displayed Link
Description
Position of the result

Let us import our installed libraries first in the scraper.py file.

    from bs4 import BeautifulSoup
    import requests

Copy after login

Then, we will make a GET request on the target URL to fetch the raw HTML data from Google.

headers={'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.361681276786'}
    url='https://www.google.com/search?q=python+tutorials&gl=us'
    response = requests.get(url,headers=headers)
    print(response.status_code)

Copy after login

Passing headers is important to make the scraper look like a natural user who is just visiting the Google search page for some information.

The above code will help you in pulling the HTML data from the Google Search link. If you got the 200 status code, that means the request was successful. This completes the first part of creating a scraper for Google.

In the next part, we will use BeautifulSoup to get out the required data from HTML.

    soup = BeautifulSoup(response.text, ‘html.parser’)

Copy after login

This will create a BS4 object to parse the HTML response and thus we will be able to easily navigate inside the HTML and find any element of choice and the content inside it.

To parse this HTML, we would need to first inspect the Google Search Page to check which common pattern can be found in the DOM location of the search results.

Scrape Google Search Results Using Python

So, after inspecting we found out that every search result is under div container with the class g. This means, we just have to run a loop over each div container with g class to get the information inside it.

Before writing the code, we will find the DOM location for the title, description, and link from the HTML.

If you inspect the title, you’ll find that it is contained within an h3 tag. From the image, we can also see that the link is located in the href attribute of the anchor tag.

Scrape Google Search Results Using Python

The displayed link or the cite link can be found inside the cite tag.

Scrape Google Search Results Using Python

And finally, the description is stored inside a div container with the class VwiC3b.

Scrape Google Search Results Using Python

Wrapping all these data entities into a single block of code:

    touch scraper.py

Copy after login

We declared an organic results array and then looped over all the elements with g class in the HTML and pushed the collected data inside the array.

Running this code will give you the desired results which you can use for various purposes including rank tracking, lead generation, and optimizing the SEO of the website.

    pip install requests
    pip install beautifulsoup4

Copy after login

So, that’s how a basic Google Scraping script is created.

However, there is a CATCH. We still can’t completely rely on this method as this can result in a block of our IP by Google. If we want to scrape search results at scale, we need a vast network of premium and non-premium proxies and advanced techniques that can make this possible. That’s where the SERP APIs come into play!

Scraping Google Using ApiForSeo’s SERP API

Another method for scraping Google is using a dedicated SERP API. They are much more reliable and don’t let you get blocked in the scraping process.

The setup for this section would be the same, just we need to register on ApiForSeo to get our API Key which will provide us with access to its SERP API.

Getting API Credentials From ApiForSeo

Scrape Google Search Results Using Python

After activating the account, you will be redirected to the dashboard where you will get your API Key.

Scrape Google Search Results Using Python

You can also copy the code from the dashboard itself.

Setting Up our code for scraping search results

Then, we will create an API request on a random query to scrape data through ApiForSeo SERP API.

    from bs4 import BeautifulSoup
    import requests

Copy after login

You can try any other query also. Don’t forget to put your API Key into the code otherwise, you will receive a 404 error.

Running this code in your terminal would immediately give you results.

    touch scraper.py

Copy after login

The above data contains various points, including titles, links, snippets, descriptions, and featured snippets like extended sitelinks. You will also get advanced feature snippets like People Also Ask For, Knowledge Graph, Answer Boxes, etc., from this API.

Conclusion

The nature of business is evolving at a rapid pace. If you don’t have access to data about ongoing trends and your competitors, you risk falling behind emerging businesses that make data-driven strategic decisions at every step. Therefore, it is crucial for a business to understand what is happening in its environment, and Google can be one of the best data sources for this purpose.

In this tutorial, we learned how to scrape Google search results using Python. If you found this blog helpful, please share it on social media and other platforms.

Thank you!

The above is the detailed content of Scrape Google Search Results Using Python. For more information, please follow other related articles on the PHP Chinese website!