jQuery source code analysis-02 regular expression RegExp commonly used regular expression

Author: nuysoft/JS Siege Master/Gao Yun QQ: 47214707 Email: nuysoft@gmail.com
Statement: This article is an original article. If you need to reprint, please indicate the source and retain the original link.
Preview of the following article: Regular expression analysis in jQuery

2.4 Commonly used regular expressions
I found a widely circulated article "Commonly Used Regular Expressions" on the Internet, analyzed it one by one, and found that there were deficiencies Make additions and corrections.

Copy code The code is as follows:

 
Commonly used numerical regularization (strict matching) 
Regular Meaning
^[1-9]d*$ matches positive integers
^-[1-9]d*$ matches negative integers
^-?[1-9]d*$ matches integers
^[1-9]d*|0$ matches non-negative integers (positive integers 0) 
^-[1-9]d*|0$ matches non-positive integers (negative integers 0) 
^[ 1-9]d*.d*|0.d*[1-9]d*$ matches positive floating point number
^-([1-9]d*.d*|0.d*[1- 9]d*)$ matches negative floating point number 
^-?([1-9]d*.d*|0.d*[1-9]d*|0?.0 |0)$ matches floating point number Points 
^[1-9]d*.d*|0.d*[1-9]d*|0?.0 |0$ matches non-negative floating point numbers (positive floating point number 0) 
^ (-([1-9]d*.d*|0.d*[1-9]d*))|0?.0 |0$ matches non-positive floating point numbers (negative floating point numbers 0) 

Copy code The code is as follows:

 
Common string regular expressions

 Regular meaning supplement 
^[A-Za-z] $ matches a string consisting of 26 English letters or /^[a-z] $/i 
^[A-Z] $ matches an uppercase letter consisting of 26 English letters The string consisting of 
^[a-z] $ matches a string consisting of 26 lowercase English letters 
^[A-Za-z0-9] $ matches a string consisting of numbers and 26 English letters Note that w contains an underscore_ 
^w $ matches a string consisting of numbers, 26 English letters, or underscores 
 Commonly used number regularization and common string regularization are the most basic regularity applications, and readers can use them as introductory exercises , try to see if you can quickly understand the meaning. 

Copy code The code is as follows:

 
Match Chinese characters 
 
The commonly used regular expression is [u4e00-u9fa5], but this range is not complete. For example: 
/[u4e00-u9fa5]/.test( '⻏' ) // Test radical ⻏, return false 
According to Unicode version 5.0 encoding, to accurately judge a Chinese character, it must include: 
 Range meaning Range meaning 
2E80-2EFF CJK radical supplement 2F00-2FDF Kangxi dictionary radical 
3000-303F CJK symbols and punctuation 31C0-31EF CJK strokes 
3200-32FF closed CJK text and month 3300- 33FF CJK Compatible
3400-4DBF CJK Unified Ideographic Symbol Extension A 4DC0-4DFF I Ching Sixty-Four Hexagram Symbols
4E00-9FBF CJK Unified Ideographic Symbols F900-FAFF CJK Compatible Hieroglyphics
FE30-FE4F CJK Compatible Form FF00-FFEF full-width ASCII, full-width punctuation 
 Therefore, the correct regular expression for matching Chinese characters is: 
var rcjk = /[u2E80-u2EFFu2F00-u2FDFu3000-u303Fu31C0-u31EFu3200-u32FFu3300-u33FFu3400-u4DBFu4DC 0-u4DFFu4E00- u9FBFuF900-uFAFFuFE30-uFE4FuFF00-uFFEF] /g; 
If you do not want to match punctuation and symbols, just remove the corresponding range in the regular expression: 
3000-303F CJK symbols and punctuation FF00-FFEF full-width ASCII, full-width punctuation 

Copy code The code is as follows:

 
 matches double-byte characters ( Including Chinese characters) 

[^x00-xff], which can be used to calculate the length of a string (a double-byte character counts as 2, and an ASCII character counts as 1). The code example is as follows: 
console .info( "abc".replace( /[^x00-xff]/g,"aa" ).length ) // 3 
console.info( "Chinese characters".replace( /[^x00-xff]/ g,"aa" ).length ) // 4 
console.info( "abc汉字".replace( /[^x00-xff]/g,"aa").length ) // 7 

Copy code The code is as follows:

Regular expression matching HTML tags

Let’s first talk about the version circulating on the Internet:
<(S*?)[^>]*>.*?< /1>|<.*? />
*? * means 0 or more, ? means 0 or 1, the two are superimposed to identify more than 0, and the function overlaps with *
(S*?) The length of the tag must be greater than 0, so *? cannot be used.
|<.*?/> Without grouping, tags written in the self-closing format of

cannot be obtained

<.*? /> Some tags are not closed, such as

, so they cannot be forced to close.
Amended as follows:
var rtag = /^<([a-z] )s*/?>.*(?:)?$/i
rtag.exec( '<-div>') // null
rtag.exec( '

abc') // ["

abc", "div"]
This expression is also incomplete, such as the second This test statement is written in order to extract tags that contain text content. If you want to strictly match, you can modify it again:
var rtag = /^<([a-z] )s*/?> ( ?:)?$/i // Remove the middle .*
The application scope of this regular rule is limited to simple tag matching and extraction, and cannot match nested tags.

Copy code The code is as follows:

 
Regular matching of leading and trailing whitespace characters Expression 

 Let’s first talk about the version circulating on the Internet: 
^s*|s*$ 
 can delete the blank characters at the beginning and end of the line, for example: 
' t nr abc t nr '.replace( /^s*|s*$/g, '' ) // abc 
But using s* cannot determine whether the string has s at the beginning or end, for example: 
/^s *|s*$/.test( 'abc' ) // true 
 amend as follows: 
^s |s $ 
' t nr abc t nr '.replace( /^s |s $/ g, '' ) // abc 
/^s |s $/.test( 'abc' ) // false 

Copy Code The code is as follows:

 
Regular expression matching email address 

 First introduce the rules of Email: local-part@domain 
  The maximum length of local-part is 64, the maximum length of domain is 253, and the maximum length is 256 
 Local-part can use any ASCII characters: 
 Uppercase and lowercase English letters a-z, A-Z 
 Numbers 0-9 
 Characters!#$%&'* -/=?^_`{|}~ 
 Characters. Cannot be the first and last, and cannot appear twice in a row 
 But some mail servers Email addresses containing special characters will be rejected 
 Domain (domain name) is limited to 26 English letters, 10 numbers, hyphen - 
 Hyphen - cannot be the first character 
 Top-level domain name (com, cn, etc.) The length is 2 to 6 characters 
 Let’s first talk about the version circulating on the Internet: 
w ([- .]w )*@w ([-.]w )*.w ([- .]w )* 
() Inexplicable grouping, if you only group without recording, you can use (?:) 
@w domain cannot contain underscore_ 
w ([-.]w )* Top level The domain name does not comply with the rules 
 and is corrected as follows: 
var remail = /^([w-_] (?:.[w-_] )*)@((?:[a-z0-9] (? :-[a-zA-Z0-9] )*) .[a-z]{2,6})$/i 
remail.exec( 'nuysoft@gmail.com' ) // "nuysoft@gmail.com ", "nuysoft", "gmail.com"] 
remail.exec( 'nuysoft@gmail.comcomcom' ) // null 
remail.exec( 'nuysoft@_gmail.com ) // null 
 The revised regex has the following limitations: 
 Does not support Chinese mailboxes and Chinese domain names. The reason why I do not support it is because of my personal preference and dislike such flashy stuff 
 Does not support special symbols, avoid Non-mail server rejection, can be added if needed. 
Reference article: 
http://en.wikipedia.org/wiki/Email_address 
http://baike.baidu.com/view/119298.htm 

Copy code The code is as follows:

 
Regular expression matching URL 

Let’s first talk about the version circulating on the Internet: 
[a-zA-z] ://[^s]* 
Rough, Each block in the URL is not grouped 
 The correction is as follows (another version circulating on the Internet): 
var _url = "^((https|http|ftp|rtsp|mms)?://)?" / / 
 "(([0-9a-z_!~*'().&= $%-] : )?[0-9a-z_!~*'().&= $%-] @) ?" // ftp user@ 
 "(([0-9]{1,3}.){3}[0-9]{1,3}" // URL in IP form- 199.194.52.184 
 "|" // Allow IP and DOMAIN (domain name) 
 "([0-9a-z_!~*'()-] .)*" // Domain name - www. 
 "([ 0-9a-z][0-9a-z-]{0,61})?[0-9a-z]." // Second-level domain name
 "[a-z]{2,6})" / / first level domain- .com or .museum 
 "(:[0-9]{1,4})?" // port- :80 
 "((/?)|" // a slash isn't required if there is no file name 
 "(/[0-9a-z_!~*'().;?:@&= $,%#-] ) /?)$"; 
var rurl = new RegExp( _url, 'i' ); 

Test: 
rurl.exec( 'baidu.com' ) // ["baidu.com", undefined, undefined, undefined, undefined, "baidu.com", undefined, "baid", undefined, undefined, "", "", undefined] 
rurl.exec( 'http://baidu.com' ) // 
rurl. exec( 'http://www.baidu.com' ) // ["http://baidu.com", "http://", "http", undefined, undefined, "baidu.com", undefined, "baid", undefined, undefined, "", "", undefined] 
rurl.test( 'baidu' ) // true 
It seems that it is easy to use even if it is not very useful. You need to learn TODO. 

Copy code The code is as follows:

 
Is the matching account legal

Let’s talk about the version circulating on the Internet first: 
^[a-zA-Z][a-zA-Z0-9_]{4,15}$ 
(Starting with a letter, 5-16 characters allowed section, alphanumeric underscores are allowed) 
 The restriction must start with a letter. It seems inappropriate now. For example, the QQ login platform 
 The restriction cannot start with an underscore. It is not necessary. For example, Baidu allows it, so it is simple. 
 Correction is as follows : 
var ruser = /w{4,16}/ 

Copy code The code is as follows:

 
Matching domestic phone numbers

The version circulating online is very useful: 
d{3}-d{8}|d{4}-d{7} 
Comment: The matching format is such as 0511-4405222 or 021-87888822 

Copy code The code is as follows:

 
Matches Tencent QQ account

The version circulating on the Internet is very useful: 
[1-9][0-9]{4,} 
Comment: Tencent QQ The number starts from 10000

Copy the code The code is as follows:

 
Match China Postal code

The version circulating on the Internet is very useful: 
[1-9]d{5}(?!d) 
Comment: China’s postal code is a 6-digit number

Copy code The code is as follows:

 
Match ID card

Let’s first talk about the version circulating on the Internet: 
d{15}|d{18} 
d{15} 
d{18} Yes Judgment, but a bit rough 
 Address, birthday, gender, etc. can be parsed from the ID card, so here is a special explanation: 
 ID ​​card rules 
 China’s ID card is 15 digits (first generation) or 18 digits ( Second generation), the difference is that the second generation certificate only adds 19 before the seventh digit of the first generation certificate and adds a verification code at the end 
 Upgrade the 15 digits to 18 digits and parse the 18 digit number composition ( Address, birthday, gender) 
The code is as follows: 
function parseID(ID) { 
if ( ID.length == 15 ) { 
// Upgrade to 18 bits 
ID = ID. substr( 0, 6 ) "19" ID.substr( 6 ); 
// The coefficient corresponding to the first 17 digits 
var rank = [ 
"7", "9", "10", " 5", "8", "4", "2", "1", "6", "3", "7", "9", "10", "5", "8", "4" , "2" 
]; 
// The first 17 are the last ID number corresponding to the remainder after weighted division by 17 
var last = [ 
"1", "0", "X", "9", "8", "7", "6", "5", "4", "3", "2" 
]; 
// Weighted sum
 for ( var i = 0, sum = 0, len = ID.length; i < len; i ) 
sum = ID[ i ] * rank[ i ]; 
// Add the last digit
ID = last[ sum % 11 ]; 
} 
if ( ID.length != 18 ) return null; 

var match = rid.exec( ID ); 
return match ? { 
ID : ID, 
area : match[ 1 ], 
y : match[ 2 ], 
m : match[ 3 ], 
d : match[ 4 ], 
sex : match[ 5 ] % 2 
} : null; 
} 
Restrictions: 
 The address code is only parsed here. How to convert the code into an actual address? Please ask Du Niang. 
 The sex in the returned object is 1 (male) or 0 (female), and no conversion is done. If required for page display, it can be converted like this: sex ? "Male" : "Female" 
Test: 
console.info( parseID( "142327840821047" ) ); 
console.info( parseID("142327198408210470" ) ); 
Reference: 
http://baike.baidu.com/view/118340 .htm#1 

Copy code The code is as follows:

 
Match IP Address 

 Let’s first talk about the version circulating on the Internet: 
d .d .d .d 
d There is no limit to the number 
The correction is as follows: 
var rip = /^(?:( ?:[01]?d{1,2}|2[0-4]d|25[0-5]).){3}(?:[01]?d{1,2}|2[0 -4]d|25[0-5])$/; 
rip.test( "192.168.1.1" ) // true 
rip.test( "0.0.0.0" ) // true 
 rip.test( "255.255.255.255" ) // true 
rip.test( "256.255.255.255" ) // false 
Further increase the grouping: 
var rip2 = /^([01]?d {1,2}|2[0-4]d|25[0-5]).([01]?d{1,2}|2[0-4]d|25[0-5]). ([01]?d{1,2}|2[0-4]d|25[0-5]).([01]?d{1,2}|2[0-4]d|25[ 0-5])$/; 
rip2.exec( "192.168.1.1" ) // ["192.168.1.1", "192", "168", "1", "1"] 
rip2 .exec( "0.0.0.0" ) // ["0.0.0.0", "0", "0", "0", "0"] 
rip2.exec( "255.255.255.255" ) // [ "255.255.255.255", "255", "255", "255", "255"] 
rip2.exec( "256.255.255.255" ) // null 

Related labels：

regexp regular expression

Previous article：Summary of important JS knowledge points_javascript skills Next article：Read JavaScript DOM Programming Art Notes_javascript skills

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

jQuery source code analysis-02 regular expression RegExp commonly used regular expression_jquery