Crawling WordPress articles can be done using a crawling plug-in, such as using the WP-AutoPost plug-in.
Enable the WP-AutoPost plug-in and create a new task, then set up the plug-in.
Article crawling settings
Under this tab, we need to set the matching rules for the article title and article content. There are two ways to set it up. It is recommended to use CSS Selector method, using this method is simpler and more precise.
We only need to set the article title CSS selector and article content CSS selector to accurately capture the article title and article content.
In the article source settings, we take the collection of "Sina Internet News" as an example. Here we will still use this example to explain, by viewing the list URL http://roll.tech.sina.com.cn/internet_worldlist/ The source code of a certain article under index.shtml can be easily set. For example, we can check the source code of a specific article http://tech.sina.com.cn/i/2013-10-18/22298831229.shtml The code is as follows:
You can see that the article title is inside the tag with the id "artibodyTitle", so the article title CSS selector only needs to be set to #artibodyTitle That is Yes;
Similarly, find the relevant code of the article content:
You can see that the article content is inside the tag with the id "artibody", so The article content CSS selector only needs to be set to #artibody; as shown below:
After the setting is completed, you can click the test button and enter the test address. If the setting is correct, The article title and article content will be displayed to facilitate checking whether the settings are correct.
For more wordpress related technical articles, please visit the wordpress tutorial column to learn!
The above is the detailed content of How to scrape WordPress articles. For more information, please follow other related articles on the PHP Chinese website!