Building Robust Web Automation with Selenium and Python-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Building Robust Web Automation with Selenium and Python

DDD

Jan 07, 2025 am 12:04 AM

Automation of the web is now an indispensable tool in modern software development and testing. In this comprehensive Selenium Python tutorial, you'll learn how to build a robust web automation framework capable of handling real-world scenarios. If you are interested in implementing automated testing in Python or creating complex web scraping automation solutions, this guide will give you industry-tested approaches and Selenium best practices.

Understanding Web Automation Fundamentals

Web automation is vital in modern software development, testing, and data collection. Its applications span from end-to-end testing of web applications to simplifying repetitive workflows, such as form submissions or web scraping. While Selenium WebDriver Python integration offers powerful capabilities, robust web automation is more than just writing scripts to mimic user interactions. It’s about designing workflows and frameworks that are maintainable, adaptable, and resilient to changes to the target web application.

Below are the key aspects we'll cover throughout this tutorial:

Selecting appropriate locators (XPath, CSS, etc.)
Dynamic elements and state loading
Implementing retry mechanisms
Managing browser sessions properly
Maintainability structure of code

We will build a web scraping automation project for a price tracker on e-commerce websites using Books to Scrape as a demo site to demonstrate these concepts while adhering to Selenium best practices.

Prerequisites

To follow along with this tutorial, you’ll need:

Python 3.x installed on your machine.
Basic knowledge of Python programming
Basic knowledge of Web Scrapping with Selenium

The code for this tutorial is available on our github repository, feel free to clone it to follow along.

Setting Up the Development Environment

Let's set up a proper development environment and install the necessary Python packages. First, create the project folder, and a new virtual environment by running the commands below:

mkdir price_tracker_automation && cd price_tracker_automation
python3 -m venv env
source env/bin/activate

Copy after login

Then, create and add the following Python packages to your requirements.txt file:

selenium==4.16.0
webdriver-manager==4.0.1
python-dotenv==1.0.0
requests==2.31.0

Copy after login

In the above code, we defined our core dependencies. The selenium package provides the foundation for our web automation framework, while webdriver-manager handles browser driver management automatically. The python-dotenv package is for environment configuration, and the requests package is for HTTP requests handling.

Now run the command below to install all the Python packages in your requirements.txt file by running the command below:

pip install -r requirements.txt

Copy after login

Lastly, create the following folder structure for our project:

mkdir price_tracker_automation && cd price_tracker_automation
python3 -m venv env
source env/bin/activate

Copy after login

Here we establish a modular project structure following software engineering best practices. The core directory contains our primary automation components, while database handles data persistence.

Building the Price Tracker Tool

With the project environment, dependencies, and folder structures created, let's proceed to build the price tracker automation tool using Selenium and Python.

Implementing our Browser Management System

Let's implement our browser management system, this is an important component for stable Selenium WebDriver Python integration. Add the code snippet below to your core/browser.py file:

selenium==4.16.0
webdriver-manager==4.0.1
python-dotenv==1.0.0
requests==2.31.0

Copy after login

The above code creates a BrowserManager class that handles WebDriver initialization and configuration. The class implements Selenium best practices by configuring Chrome options for stability and performance. The headless parameter allows for running tests without a visible browser window, which is crucial for CI/CD pipelines.

Now add the following methods to the BrowserManager class to implement the core browser management features:

pip install -r requirements.txt

Copy after login

In the above code, the start_browser method utilizes webdriver-manager to automatically handle driver installation and updates, while close_browser ensures proper resource cleanup. The implementation includes an implicit wait configuration to handle dynamic page loading gracefully.

Creating an Element Handler

Next, let's proceed to implement the element interaction system, this is important in any web automation framework because it enables us to detect and interact with elements in a reliable way while following Selenium's best practices. Add the code snippets to your core/element_handler.py

price_tracker_automation/
├── core/
│   ├── browser.py
|   ├── scraper.py
│   └── element_handler.py
├── database/
│   └── db_manager.py
├── notifications/
|   └── price_alert.py
├── requirements.txt
├── run.py
└── main.py

Copy after login

In the above code, we created an ElementHandler class, which encapsulates Selenium WebDriver Python interaction patterns. The class accepts a WebDriver instance and configurable timeout parameter.

Update your ElementHandler class to add core element interaction methods:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support import expected_conditions as EC
import logging

class BrowserManager:
    def __init__(self, headless=False):
        self.options = webdriver.ChromeOptions()
        if headless:
            self.options.add_argument('--headless')

        # Add additional stability options
        self.options.add_argument('--no-sandbox')
        self.options.add_argument('--disable-dev-shm-usage')
        self.options.add_argument('--disable-gpu')

        self.driver = None
        self.logger = logging.getLogger(__name__)

Copy after login

The above methods use Selenium's WebDriverWait and expected_conditions to detect elements so that it can also handle dynamic web pages where the elements may load asynchronously.

Add another method to implement the text extraction logic:

    def start_browser(self):
        """Initialize and return a ChromeDriver instance"""
        try:
 service = webdriver.ChromeService()
            self.driver = webdriver.Chrome(service=service, options=self.options)
            self.driver.implicitly_wait(10)
            return self.driver
        except Exception as e:
            self.logger.error(f"Failed to start browser: {str(e)}")
            raise

    def close_browser(self):
        """Safely close the browser"""
        if self.driver:
            self.driver.quit()
            self.driver = None

Copy after login

The method includes retry logic to handle StaleElementReferenceException, which is a common challenge in web automation.

Implementing the Price Tracker Core

Now let's build our main scraping functionality, incorporating automated testing Python concepts and robust error handling. Add the code snippets below to your core/scraper.py file:

mkdir price_tracker_automation && cd price_tracker_automation
python3 -m venv env
source env/bin/activate

Copy after login

In the above code, we created the BookScraper class that integrates our browser and element handling components. The class follows the Page Object Model pattern, a key concept in web automation framework design, by centralizing element locators and providing a clean API for scraping operations.

Next, update the BookScraper class to add the core product data extraction methods:

selenium==4.16.0
webdriver-manager==4.0.1
python-dotenv==1.0.0
requests==2.31.0

Copy after login

The above methods uses a structured approach to gather product information, maintaining detailed logs for debugging and monitoring.

Setting Up Database Management

Let's implement the database layer of our web automation framework, which will handle the persistent storage of our scraped data. This component will allow us to track the price changes over time. Add the code snippets below to your database/db_manager.py:

pip install -r requirements.txt

Copy after login

In the above code, we defined our DatabaseManager class that handles all database operations. We used SQLite for simplicity and portability, to avoid having to set up and configure a database and SQLite is also ideal for our web scraping automation project since we are not storing large amounts of data.

Next, update your database/db_manager.py to add the database initialization method:

price_tracker_automation/
├── core/
│   ├── browser.py
|   ├── scraper.py
│   └── element_handler.py
├── database/
│   └── db_manager.py
├── notifications/
|   └── price_alert.py
├── requirements.txt
├── run.py
└── main.py

Copy after login

Here we establish our database schema using SQL DDL statements, and create separate tables for products and price history, with appropriate relationships and constraints which will enable us to track price and perform historical analysis on the data we store.

Now let's add another method to save data to the database:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support import expected_conditions as EC
import logging

class BrowserManager:
    def __init__(self, headless=False):
        self.options = webdriver.ChromeOptions()
        if headless:
            self.options.add_argument('--headless')

        # Add additional stability options
        self.options.add_argument('--no-sandbox')
        self.options.add_argument('--disable-dev-shm-usage')
        self.options.add_argument('--disable-gpu')

        self.driver = None
        self.logger = logging.getLogger(__name__)

Copy after login

In the above code, we implemented the data persistence logic using parameterized queries to prevent SQL injection. The method handles both insert and update operations using SQLite's ON CONFLICT clause.

Main Application Integration

Let's tie everything together with our main application class, incorporating all elements of our Selenium WebDriver Python implementation. Add the code snippets below to your main.py file:

    def start_browser(self):
        """Initialize and return a ChromeDriver instance"""
        try:
 service = webdriver.ChromeService()
            self.driver = webdriver.Chrome(service=service, options=self.options)
            self.driver.implicitly_wait(10)
            return self.driver
        except Exception as e:
            self.logger.error(f"Failed to start browser: {str(e)}")
            raise

    def close_browser(self):
        """Safely close the browser"""
        if self.driver:
            self.driver.quit()
            self.driver = None

Copy after login

In the above code, we create the main PriceTracker class that orchestrates all components of our web scraping automation solution. The PriceTracker class follows dependency injection patterns to maintain modularity and testability.

Next, update our PriceTracker class to add the core tracking methods:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException

class ElementHandler:
    def __init__(self, driver, timeout=10):
        self.driver = driver
        self.timeout = timeout

Copy after login

Here we implemented the main product tracking logic that handles the web scraping and stores the scraped data.

Running the Application

Let's create an execution script to run our automation script. Add the following code snippets to your run.py file:

   def wait_for_element(self, locator, timeout=None):
        """Wait for element with retry mechanism"""
 timeout = timeout or self.timeout
        try:
 element = WebDriverWait(self.driver, timeout).until(
 EC.presence_of_element_located(locator)
 )
            return element
        except TimeoutException:
            raise TimeoutException(f"Element {locator} not found after {timeout} seconds")

    def get_text_safely(self, locator, timeout=None):
        """Safely get text from element with retry mechanism"""
 max_retries = 3
        for attempt in range(max_retries):
            try:
 element = self.wait_for_element(locator, timeout)
                return element.text.strip()
            except StaleElementReferenceException:
                if attempt == max_retries - 1:
                    raise
                continue

Copy after login

Now run the following command on your terminal to run the script:

mkdir price_tracker_automation && cd price_tracker_automation
python3 -m venv env
source env/bin/activate

Copy after login

The above command will show the output on the screenshot below:

Building Robust Web Automation with Selenium and Python

From the above script, you can see that our automation script is tracking the price for all the specified URLs.

Tracks Price Change

Our current implementation only tracks and saves product prices. After tracking prices, let's enhance our price tracker to notify users about price changes. Add the following code snippets to your notifications/price_alert.py file:

selenium==4.16.0
webdriver-manager==4.0.1
python-dotenv==1.0.0
requests==2.31.0

Copy after login

In the above code snippet, we created a PriceAlertManager class with essential dependencies. The manager takes a database manager instance as a parameter and sets up logging for tracking alert operations. The class uses complex joins to compare current and previous prices. Then we implemented a dynamic price change percentage computation and created a structured dictionary for price change information.

Next, update your PriceAlertManager class to add an email notification functionality:

pip install -r requirements.txt

Copy after login

Here, we created an email notification using Python's email and SMTP libraries. The implementation uses the MIMEText class to create properly formatted email messages. The email body is dynamically generated using f-strings, incorporating detailed price change information with precise currency formatting.

Now let's modify our run script to include price alerts:

price_tracker_automation/
├── core/
│   ├── browser.py
|   ├── scraper.py
│   └── element_handler.py
├── database/
│   └── db_manager.py
├── notifications/
|   └── price_alert.py
├── requirements.txt
├── run.py
└── main.py

Copy after login

Now if you run the script again, it will track the product prices and alert you of the products whose prices have changed like in the screenshot below:

Building Robust Web Automation with Selenium and Python

Perhaps you can run this script in a cron job to track the prodcuts prices and alert you in real-time of the price changes without having to manually run it everytime.
Eg. 0 */6 * * * python run.py --urls
"http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
"http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"
"http://books.toscrape.com/catalogue/soumission_998/index.html"

Conclusion

Throughout this tutorial, you've learned how to build a robust web automation tool using Selenium and Python. We started by understanding the web automation fundamentals, then we set up a development eviroment for the Price Traker tool we built for the demonstrations in this tutorial. Then we went futher to build the Price tracker application that tracks prices of products and alerts users of the price changes. Now that you have this knowledge, what tool would you be building next. Let me know in the comments section. Happy coding!

The above is the detailed content of Building Robust Web Automation with Selenium and Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1664

CakePHP Tutorial

1423

Laravel Tutorial

1317

PHP Tutorial

1268

C# Tutorial

1242

Related knowledge

Python vs. C : Applications and Use Cases Compared Apr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

Python: Games, GUIs, and More Apr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

The 2-Hour Python Plan: A Realistic Approach Apr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python vs. C : Learning Curves and Ease of Use Apr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

How Much Python Can You Learn in 2 Hours? Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

Python and Time: Making the Most of Your Study Time Apr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

Python: Exploring Its Primary Applications Apr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

See all articles