Traditional cURL cannot execute browser scripts in the page, and when crawling some web pages that have restrictions on crawlers, it is often necessary to set detailed http headers to break through the restrictions, which is more complicated to write.
Introduction to Selenium:
Selenium is a tool for web application testing (and not just for testing).
Selenium runs directly in the browser, like a real user. Supports more browsers.
components
Selenium IDE: Firefox plug-in, with the function of recording scripts. Supports automatic recording of actions and automatic generation of automation scripts in other languages.
Selenium Remote Control (RC): supports multiple platforms (Windows, Linux) and multiple browsers (IE, Firefox, Opera, Safari, Chrome), and can be used in multiple languages (Java, Ruby, Python, Perl, PHP, C# )Write use cases.
Selenium Grid: Allows Selenium-RC to scale for large test case sets or test case sets that need to be run in different environments.
Example: Drive chrome to simulate logging into Taobao and obtain page information
1. Go to the project homepage: SeleniumHQ download
Selenium Server (formerly the Selenium RC Server)
Third Party Browser Drivers NOT DEVELOPED by seleniumhq
(Select chrome driver)
Third Party Language Bindings NOT DEVELOPED by seleniumhq
(Choose PHP by Adam Goucher (SeHQ recommended php client))
2.Open selenium
- java -jar path_to_selenium.jar
- [-timeout 0]
- [-Dwebdriver.server.session.timeout=0]
- -Dwebdriver.chrome.driver="path_to_chrome_driver"
- -browser [-timeout=0] [ -browserTimeout=0]
- browserName=chrome,[timeout=0]
-
Copy code
If you need to run for a long time, please set the timeout period in each '[ ]' as appropriate
3.php code
- function waitForAjax() {
- global $session;
- do {
- sleep(1);
- } while($session->execute(array('script' => "return (document.readyState != 'complete')", 'args' => array())));
- } //This function will hang the script until Ajax ends
-
- require_once "webdriver/PHPWebDriver/__init__ .php";
- //Introducing selenium's PHP encapsulation function library
- // Download address: https://github.com/Element-34/php-webdriver
- // There are various methods of operating the browser in the document, such as obtaining All cookies etc.
-
- $wd_host = 'http://127.0.0.1:4444/wd/hub';
- $web_driver = new PHPWebDriver_WebDriver($wd_host);
-
- $session = $web_driver->session('chrome' );
-
- //Set the timeout period
- $session->implicitlyWait(5);
- $session->setScriptTimeout(5);
- $session->setPageLoadTimeout(15);
-
- //Open the connection
- $ session->open('http://login.m.taobao.com/login.htm?tpl_redirect_url=http://m.taobao.com');
-
- //Enter the verification code, if necessary
- sleep(5);
-
- //Please set the account password
- $session->element('css selector', 'input[name=TPL_username]')->value(array('value' => str_split ('your_username')));
- $session->element('css selector', 'input[name=TPL_password]')->value(array('value' => str_split('your_password')) );
-
- //Simulate clicking the login button
- $elements = $session->element('css selector', '.c-btn-oran-big')->click();
-
- //Open m .taobao.com, the cookie has been obtained at this time
- $session->open('http://m.taobao.com/');
-
- //Waiting for ajax to be loaded
- waitForAjax();
-
- $elements = $session->element('css selector', 'body')->text();
- //Get the page content when ajax is executed after logging in
- ?>
-
Copy the code
After that, you can perform various operations of the element method on the $session instance as needed.
Supports the following methods to select elements
ID
xpath
link text
partial link text
name
tag name
class name
css selector
PS: How various libraries detect Ajax situations
jQuery: "jQuery.active"
Prototype: "Ajax.activeRequestCount"
Dojo: "dojo.io.XMLHTTPTransport.inFlight.length"
|