Automation of the web is now an indispensable tool in modern software development and testing. In this comprehensive Selenium Python tutorial, you'll learn how to build a robust web automation framework capable of handling real-world scenarios. If you are interested in implementing automated testing in Python or creating complex web scraping automation solutions, this guide will give you industry-tested approaches and Selenium best practices.
Web automation is vital in modern software development, testing, and data collection. Its applications span from end-to-end testing of web applications to simplifying repetitive workflows, such as form submissions or web scraping. While Selenium WebDriver Python integration offers powerful capabilities, robust web automation is more than just writing scripts to mimic user interactions. It’s about designing workflows and frameworks that are maintainable, adaptable, and resilient to changes to the target web application.
Below are the key aspects we'll cover throughout this tutorial:
We will build a web scraping automation project for a price tracker on e-commerce websites using Books to Scrape as a demo site to demonstrate these concepts while adhering to Selenium best practices.
To follow along with this tutorial, you’ll need:
The code for this tutorial is available on our github repository, feel free to clone it to follow along.
Let's set up a proper development environment and install the necessary Python packages. First, create the project folder, and a new virtual environment by running the commands below:
mkdir price_tracker_automation && cd price_tracker_automation python3 -m venv env source env/bin/activate
Then, create and add the following Python packages to your requirements.txt file:
selenium==4.16.0 webdriver-manager==4.0.1 python-dotenv==1.0.0 requests==2.31.0
In the above code, we defined our core dependencies. The selenium package provides the foundation for our web automation framework, while webdriver-manager handles browser driver management automatically. The python-dotenv package is for environment configuration, and the requests package is for HTTP requests handling.
Now run the command below to install all the Python packages in your requirements.txt file by running the command below:
pip install -r requirements.txt
Lastly, create the following folder structure for our project:
mkdir price_tracker_automation && cd price_tracker_automation python3 -m venv env source env/bin/activate
Here we establish a modular project structure following software engineering best practices. The core directory contains our primary automation components, while database handles data persistence.
With the project environment, dependencies, and folder structures created, let's proceed to build the price tracker automation tool using Selenium and Python.
Let's implement our browser management system, this is an important component for stable Selenium WebDriver Python integration. Add the code snippet below to your core/browser.py file:
selenium==4.16.0 webdriver-manager==4.0.1 python-dotenv==1.0.0 requests==2.31.0
The above code creates a BrowserManager class that handles WebDriver initialization and configuration. The class implements Selenium best practices by configuring Chrome options for stability and performance. The headless parameter allows for running tests without a visible browser window, which is crucial for CI/CD pipelines.
Now add the following methods to the BrowserManager class to implement the core browser management features:
pip install -r requirements.txt
In the above code, the start_browser method utilizes webdriver-manager to automatically handle driver installation and updates, while close_browser ensures proper resource cleanup. The implementation includes an implicit wait configuration to handle dynamic page loading gracefully.
Next, let's proceed to implement the element interaction system, this is important in any web automation framework because it enables us to detect and interact with elements in a reliable way while following Selenium's best practices. Add the code snippets to your core/element_handler.py
price_tracker_automation/ ├── core/ │ ├── browser.py | ├── scraper.py │ └── element_handler.py ├── database/ │ └── db_manager.py ├── notifications/ | └── price_alert.py ├── requirements.txt ├── run.py └── main.py
In the above code, we created an ElementHandler class, which encapsulates Selenium WebDriver Python interaction patterns. The class accepts a WebDriver instance and configurable timeout parameter.
Update your ElementHandler class to add core element interaction methods:
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.support import expected_conditions as EC import logging class BrowserManager: def __init__(self, headless=False): self.options = webdriver.ChromeOptions() if headless: self.options.add_argument('--headless') # Add additional stability options self.options.add_argument('--no-sandbox') self.options.add_argument('--disable-dev-shm-usage') self.options.add_argument('--disable-gpu') self.driver = None self.logger = logging.getLogger(__name__)
The above methods use Selenium's WebDriverWait and expected_conditions to detect elements so that it can also handle dynamic web pages where the elements may load asynchronously.
Add another method to implement the text extraction logic:
def start_browser(self): """Initialize and return a ChromeDriver instance""" try: service = webdriver.ChromeService() self.driver = webdriver.Chrome(service=service, options=self.options) self.driver.implicitly_wait(10) return self.driver except Exception as e: self.logger.error(f"Failed to start browser: {str(e)}") raise def close_browser(self): """Safely close the browser""" if self.driver: self.driver.quit() self.driver = None
The method includes retry logic to handle StaleElementReferenceException, which is a common challenge in web automation.
Now let's build our main scraping functionality, incorporating automated testing Python concepts and robust error handling. Add the code snippets below to your core/scraper.py file:
mkdir price_tracker_automation && cd price_tracker_automation python3 -m venv env source env/bin/activate
In the above code, we created the BookScraper class that integrates our browser and element handling components. The class follows the Page Object Model pattern, a key concept in web automation framework design, by centralizing element locators and providing a clean API for scraping operations.
Next, update the BookScraper class to add the core product data extraction methods:
selenium==4.16.0 webdriver-manager==4.0.1 python-dotenv==1.0.0 requests==2.31.0
The above methods uses a structured approach to gather product information, maintaining detailed logs for debugging and monitoring.
Let's implement the database layer of our web automation framework, which will handle the persistent storage of our scraped data. This component will allow us to track the price changes over time. Add the code snippets below to your database/db_manager.py:
pip install -r requirements.txt
In the above code, we defined our DatabaseManager class that handles all database operations. We used SQLite for simplicity and portability, to avoid having to set up and configure a database and SQLite is also ideal for our web scraping automation project since we are not storing large amounts of data.
Next, update your database/db_manager.py to add the database initialization method:
price_tracker_automation/ ├── core/ │ ├── browser.py | ├── scraper.py │ └── element_handler.py ├── database/ │ └── db_manager.py ├── notifications/ | └── price_alert.py ├── requirements.txt ├── run.py └── main.py
Here we establish our database schema using SQL DDL statements, and create separate tables for products and price history, with appropriate relationships and constraints which will enable us to track price and perform historical analysis on the data we store.
Now let's add another method to save data to the database:
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.support import expected_conditions as EC import logging class BrowserManager: def __init__(self, headless=False): self.options = webdriver.ChromeOptions() if headless: self.options.add_argument('--headless') # Add additional stability options self.options.add_argument('--no-sandbox') self.options.add_argument('--disable-dev-shm-usage') self.options.add_argument('--disable-gpu') self.driver = None self.logger = logging.getLogger(__name__)
In the above code, we implemented the data persistence logic using parameterized queries to prevent SQL injection. The method handles both insert and update operations using SQLite's ON CONFLICT clause.
Let's tie everything together with our main application class, incorporating all elements of our Selenium WebDriver Python implementation. Add the code snippets below to your main.py file:
def start_browser(self): """Initialize and return a ChromeDriver instance""" try: service = webdriver.ChromeService() self.driver = webdriver.Chrome(service=service, options=self.options) self.driver.implicitly_wait(10) return self.driver except Exception as e: self.logger.error(f"Failed to start browser: {str(e)}") raise def close_browser(self): """Safely close the browser""" if self.driver: self.driver.quit() self.driver = None
In the above code, we create the main PriceTracker class that orchestrates all components of our web scraping automation solution. The PriceTracker class follows dependency injection patterns to maintain modularity and testability.
Next, update our PriceTracker class to add the core tracking methods:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException, StaleElementReferenceException class ElementHandler: def __init__(self, driver, timeout=10): self.driver = driver self.timeout = timeout
Here we implemented the main product tracking logic that handles the web scraping and stores the scraped data.
Let's create an execution script to run our automation script. Add the following code snippets to your run.py file:
def wait_for_element(self, locator, timeout=None): """Wait for element with retry mechanism""" timeout = timeout or self.timeout try: element = WebDriverWait(self.driver, timeout).until( EC.presence_of_element_located(locator) ) return element except TimeoutException: raise TimeoutException(f"Element {locator} not found after {timeout} seconds") def get_text_safely(self, locator, timeout=None): """Safely get text from element with retry mechanism""" max_retries = 3 for attempt in range(max_retries): try: element = self.wait_for_element(locator, timeout) return element.text.strip() except StaleElementReferenceException: if attempt == max_retries - 1: raise continue
Now run the following command on your terminal to run the script:
mkdir price_tracker_automation && cd price_tracker_automation python3 -m venv env source env/bin/activate
The above command will show the output on the screenshot below:
From the above script, you can see that our automation script is tracking the price for all the specified URLs.
Our current implementation only tracks and saves product prices. After tracking prices, let's enhance our price tracker to notify users about price changes. Add the following code snippets to your notifications/price_alert.py file:
selenium==4.16.0 webdriver-manager==4.0.1 python-dotenv==1.0.0 requests==2.31.0
In the above code snippet, we created a PriceAlertManager class with essential dependencies. The manager takes a database manager instance as a parameter and sets up logging for tracking alert operations. The class uses complex joins to compare current and previous prices. Then we implemented a dynamic price change percentage computation and created a structured dictionary for price change information.
Next, update your PriceAlertManager class to add an email notification functionality:
pip install -r requirements.txt
Here, we created an email notification using Python's email and SMTP libraries. The implementation uses the MIMEText class to create properly formatted email messages. The email body is dynamically generated using f-strings, incorporating detailed price change information with precise currency formatting.
Now let's modify our run script to include price alerts:
price_tracker_automation/ ├── core/ │ ├── browser.py | ├── scraper.py │ └── element_handler.py ├── database/ │ └── db_manager.py ├── notifications/ | └── price_alert.py ├── requirements.txt ├── run.py └── main.py
Now if you run the script again, it will track the product prices and alert you of the products whose prices have changed like in the screenshot below:
Perhaps you can run this script in a cron job to track the prodcuts prices and alert you in real-time of the price changes without having to manually run it everytime.
Eg. 0 */6 * * * python run.py --urls
"http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
"http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"
"http://books.toscrape.com/catalogue/soumission_998/index.html"
Throughout this tutorial, you've learned how to build a robust web automation tool using Selenium and Python. We started by understanding the web automation fundamentals, then we set up a development eviroment for the Price Traker tool we built for the demonstrations in this tutorial. Then we went futher to build the Price tracker application that tracks prices of products and alerts users of the price changes. Now that you have this knowledge, what tool would you be building next. Let me know in the comments section. Happy coding!
The above is the detailed content of Building Robust Web Automation with Selenium and Python. For more information, please follow other related articles on the PHP Chinese website!