Matching URLs with or Without Protocol and Domain Prefixes
When working with URLs, it's often necessary to match them regardless of whether they include the HTTP/HTTPS protocol or the "www" domain prefix. Here's a detailed breakdown of a regular expression to accomplish this:
<code class="php">$regex = "(https?|ftp)://)?"; // SCHEME (Optional) $regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; // User and Pass (Optional) $regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address $regex .= "((:[0-9]{2,5})?)?"; // Port (Optional) $regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path (Optional) $regex .= "(?=[a-z+&$_.-][a-z0-9;:@&%=+/$_.-]*)?"; // GET Query (Optional) $regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor (Optional)</code>
Explanation:
To check against this regular expression, use the following syntax:
<code class="php">preg_match("~^$regex$~i", $url, $m);</code>
This ensures that the entire URL matches the pattern and includes any optional parts. By using this regular expression, you can reliably match URLs in various formats.
The above is the detailed content of How to Match URLs with or Without Protocol and Domain Prefixes?. For more information, please follow other related articles on the PHP Chinese website!