In modern society, weather information has become an indispensable part of people's lives. Whether it is travel plans, travel arrangements or today's outfit matching, you need to rely on accurate weather forecasts. However, how is weather forecast data obtained? In fact, these weather forecast data are provided by some specialized weather data websites, and these weather data can be easily captured through web crawlers. This article will take obtaining weather forecast data of a certain city as an example to introduce how to use PHP to write a crawler to capture weather data.
1. Analyze the target website
Before carrying out web crawling, you first need to analyze the source code structure of the target website and understand the location of the information you need to obtain in the source code. Here we take "China Weather Network" (http://www.weather.com.cn/) as an example. This website provides weather forecasts for various cities. What we want to capture is the weather forecast information for a certain city.
Open the browser and visit the website, enter the name of the target city, such as "Beijing", and click Query. At this time, the city's weather forecast for today and the next 7 days will appear. This is the information we want to capture. Using the browser's developer tools to analyze the web page source code, you can see that the weather forecast information is contained in a div tag with an id of "7d".
2. Writing the crawler program
After analyzing the source code structure of the target website, we can start writing the crawler program. First, you need to define some constants and variables to store some configuration information, such as the target city, the URL of the target weather data, etc.
// 目标城市名称 $city_name = '北京'; // 目标城市天气数据URL $url = 'http://www.weather.com.cn/weather/101010100.shtml';
The next step is the core logic of the crawler. First, you need to use the CURL library to make an HTTP request and obtain the weather forecast page of the target city, then parse the HTML page to obtain the required data, and finally save the data to a file for subsequent processing.
// 初始化 CURL,获取天气预报页面 $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HEADER, 0); $page_content = curl_exec($ch); // 解析天气预报页面,获取需要的信息 $doc = new DOMDocument(); $doc->loadHTML($page_content); $xpath = new DOMXPath($doc); // 获取未来 7 天天气预报信息 $days = $xpath->query('//div[@id="7d"]//div[@class="con"]/ul/li'); // 遍历天气预报信息,保存到文件中 $file = fopen('weather.txt', 'a+'); foreach($days as $day) { $date = trim($day->getElementsByTagName('h1')->item(0)->nodeValue); $conditions = $day->getElementsByTagName('p')->item(0)->nodeValue; $min_temperature = $day->getElementsByTagName('span')->item(0)->nodeValue; $max_temperature = $day->getElementsByTagName('span')->item(1)->nodeValue; $line = sprintf("%s%s %s %s ", $city_name, $date, $min_temperature, $max_temperature); fwrite($file, $line); } fclose($file); // 关闭 CURL curl_close($ch);
3. Run the crawler program
After completing the writing of the crawler program, you can run the program and obtain weather data. Switch to the directory where the program is located in the terminal and enter the following command to run the program.
php weather_spider.php
The program may take some time to run, depending on the loading speed of the target city’s weather data page. But by observing the console output, we can see that the program has run successfully and the weather forecast information has been saved to the file.
4. Summary
Through the introduction of this article, we learned how to use PHP to write a web crawler program to obtain data from the target website. Although web crawlers have powerful data acquisition capabilities, they also need to pay attention to some ethical and legal issues, such as not maliciously attacking other people's websites, infringing on other people's data privacy, etc. I hope everyone can abide by relevant legal regulations and ethical standards when using web crawlers, and make reasonable use of web crawler technology.
The above is the detailed content of Practical crawler combat: PHP crawling weather data. For more information, please follow other related articles on the PHP Chinese website!