PHP code for crawling remote website data
Now there may be many program enthusiasts who will encounter the same question, that is, how to crawl other people’s websites like a search engine HTML code, and then collect and organize the code into useful data! Let me introduce some simple examples today.
Ⅰ. Example of grabbing the title of a remote web page:
The following is a code snippet:
<?php /* +------------------------------------------------------------- +抓取网页标题的代码,直接拷贝本代码片段,另存为.php文件执行即可. +------------------------------------------------------------- */ error_reporting (7); $file = fopen ("http://www.php.cn/", "r"); if (!$file) { echo "<font color=red>Unable to open remote file.</font>\n"; exit; } while (!feof ($file)) { $line = fgets ($file, 1024); if (eregi ("<title>(.*)</title>", $line, $out)) { $title = $out[1]; echo "".$title.""; break; } } fclose($file); //End ?>
Ⅱ. Example of grabbing the HTML code of a remote web page:
The following is the code snippet:
<? php /* +---------------- +DNSing Sprider +---------------- */ $fp = fsockopen("www.php.cn", 80, $errno, $errstr, 30); if (!$fp) { echo "$errstr ($errno)<br/>\n"; } else { $out = "GET / HTTP/1.1\r\n"; $out .= "Host:www.php.cn\r\n"; $out .= "Connection: Close \r\n\r\n"; fputs($fp, $out); while (!feof($fp)) { echo fgets($fp, 128); } fclose($fp); } //End ?>
Copy the above two code snippets directly and run them back to see the effect. The above example is just for grabbing The prototype of web page data, to make it more suitable for your own use, the situation varies.
fopen() binds the name resource specified by file to a stream.
filesize Returns the number of bytes of the file size, and returns FALSE if an error occurs.
Note: Because PHP's integer type is signed, and most platforms use 32-bit integers, filesize() The function may return unexpected results when encountering files larger than 2GB. For files between 2GB and 4GB, you can usually use sprintf("%u", filesize($file)) to overcome this problem.
fread () Read up to length bytes from the file pointer handle. This function will stop reading when length bytes are read, or when EOF is reached, or (for network streams) when a packet is available Get the file , depending on which situation you encounter first.
Note: Low version usage! It is recommended to use file_get_contents for php5
file_get_contents -- Read the entire file into a String
string file_get_contents (string filename [, int use_include_path [, resource context]])
Same as file(), except that file_get_contents() returns the file as a string. ## The #file_get_contents() function is the preferred method for reading the contents of a file into a string. If the operating system supports it, memory mapping technology will also be used to enhance performance.
The above is the detailed content of Detailed explanation of how PHP reads or grabs remote code instances. For more information, please follow other related articles on the PHP Chinese website!