PHP implements collection and capture of single product information on Taobao, and captures product information
To call Taobao data, you can use the API provided by Taobao. If you only need to call Taobao product image names and other public information on your own website, use the file_get_contents function in PHP to achieve it.
Things:
file_get_contents(url) This function outputs the webpage content (source code) in the form of a string (a whole string) based on the URL such as http://www.baidu.com, and then matches it with regular expressions such as preg_match, preg_replace, etc. This operation can be used to obtain the specific div, img and other information of the url. Of course, the premise is that the structure of a single product page on Taobao is fixed. For example, the id in the img of 500 pictures is J_ImgBooth!
Specific implementation method: (Get 500 pictures, name, price, attributes and product description)
Copy code The code is as follows:
$text=file_get_contents("http://item.taobao.com/item.htm?id=2380347279"); //Save the page content on the url address into $text
A. Get 500 pictures:
Copy code The code is as follows:
preg_match('/
]*id="J_ImgBooth"[^r]*rc="([^"]*)"[^>]*>/', $text, $img );
//Use regular rules to capture the img with the id J_ImgBooth in the img tag. $img[0] is the img tag of the 500 pictures, and $img[1] is the picture address of the 500 pictures;
B. Get name:
Copy code The code is as follows:
preg_match('/
([^<>]*)/', $text, $title);
//Because the product name tag in the text does not have a special class or id, it is difficult to crawl, so the content in the tag is captured. Generally speaking, the content in the title is the product name (actually there are some differences), $title [0]The entire title tag$title[1]The content of the tag;
$title=iconv('GBK','UTF-8',$title);
//If your website is utf8 encoded, you need to transcode it (Taobao is gbk encoded)
C. Get price:
Copy code The code is as follows:
preg_match('/<([a-z]+)[^i]*id="J_StrPrice"[^>]*>([^<]*)\1>/is', $text , $price);
//Similarly obtain the tag content $price[2] with id J_StrPrice, $price[0] is the entire tag, $price[1] is the strong tag name;
$price=floatval($price);//Put it into the database and probably change the variable type
D. Get attributes:
The content obtained before is all in a single tag, which can be obtained with only one regular expression. However, if you want to obtain such as
Copy code The code is as follows:
There are n unknown <> tags in a specific div. It will be very difficult to obtain this specific div. After searching on the Internet, the closest one is "/<([a-z]+)[^>" ;]*>([^<>]|(?R))*\1>/" uses recursion to grab tag pairs, but it cannot grab specific tags, so I want to easily grab the class I can’t do it with the div of =”attributes”. However, Taobao web pages have their own particularity, that is, the structure of each tag is basically fixed...
...
The tag behind is either
or< /div>
, so we can use workarounds to obtain the content of the attribute tag.
Copy code The code is as follows:
preg_match('/<(div)[^c]*class="attributes"[^>]*>.*\1>/is', $text, $text0);
//This regular rule will capture the beginning of
tag of the entire page. Of course, our attribute tag is in the front part of this.
$text1=preg_replace("/
[^<]*<(div)[^c]*id="description"[^>]*>.*\1>/ is","",$text0);
//Match
to the end
and then replace it with "" (that is, delete the matching one), so if the attributes div is followed by If it is description, then we have achieved our goal.
$attributes=preg_replace("/
[^<]*<(div)[^c]*class="box J_TBox"[^>]*>.*\1> /is","",$text1);
//If attributes are followed by the box J_Tbox tag, then we also need to use the above step to remove the box J_Tbox tag. Of course, if the div of attributes is followed by description, this step will not match anything, that is, nothing. Do.
E. Get description:
Through the above method, you must think that any tag on the Taobao page can be easily obtained (I thought so before), but when you use this method to obtain the description, the content you get will be "Description Loading", which is Yes, this description is not in the source code. It is loaded from nowhere in Taobao after opening the page and loading a lot of js.
Okay, then we can also imitate it and put some js in it. Not sure what would be useful for loading descriptions? It's okay, it must be loaded in all. I don’t know which specific divs need to be placed there? Grab a source code, delete some divs and try it step by step. You will find "
Copy code The code is as follows:
These divs are necessary to load the description, so the following is the code:
Copy code The code is as follows:
preg_match_all('//is', $text, $content);//Page js script
$content=$content[0];
$description='
';
foreach ($content as &$v){$description.=iconv('GBK','UTF-8',$v);};
//Put this $description into the page, and the description will be automatically loaded. Of course, if multiple product descriptions are placed on the same page, only one description will be loaded.
http://www.bkjia.com/PHPjc/939398.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/939398.htmlTechArticlePHP implements collection and capture of single product information on Taobao. To capture product information and call Taobao data, you can use the data provided by Taobao api, if you only need to call Taobao product image names and other public information...