Regular expression test for obtaining common HTML standard hyperlink parameters
Because I recently want to build something similar to a professional search engine, I need to crawl all the hyperlinks of the web page.
Please help me test whether the following code can target all standard hyperlinks.
The test code is as follows:
// -- -------------------------------------------------- -----------------------
// File name: Noname1.php
// Description: Universal link parameter acquisition regular expression test
// Requirement: PHP4 (http://www.php.net)
// Copyright(C), HonestQiao, 2005, All Rights Reserved.
// Author: HonestQiao (honestqiao@hotmail.com)
// Parameter description:
// $strSource: HTML webpage containing standard links
// $strResult: Processing results
// Additional instructions:
// Standard links, use
Links included
// -------------------------------- ------------------------------------------
$strSource = < <
t1 t2 t3 t4 HTML;
preg_match_all('/
( .+?)/sim', $strSource, $strResult, PREG_PATTERN_ORDER);
for($i = 0; $i < count($strResult[1]); $ i++)
{
printf("%d href=(%s) title=(%s) n", $i, $strResult[1][$i], $strResult[2][$i ]);
}
?>
If your test data conforms to the standard link, but is not processed here, please tell me the test data and your test environment.