Regular expressions are very useful for finding, matching, processing strings, replacing and converting strings, input and output, etc. Here are some commonly used regular expressions.
1. : Mark the next character as a special character, or a literal character, or a backward reference, or an octal escape character. For example, 'n' matches the character "n". 'n' matches a newline character. The sequence '' matches "" and "(" matches "(".
2.^
: Matches the beginning of the input string. If the Multiline property of the RegExp object is set, ^ also matches The position after 'n' or 'r'.
3.$
: Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches 'n' or ' The position before r'.
*: Matches the previous subexpression zero or more times. For example, zo can match "z" and "zoo". {0,}.
5.
: Match the previous subexpression one or more times. For example, 'zo+' can match "zo" and "zoo", but not "z." ". + Equivalent to {1,}. 6.
: Matches the previous subexpression zero or one time. For example, "do(es)?" can match "do" or "do" in "does". ? Equivalent to {0,1}. 7.
: n is a non-negative integer. For example. , 'o{2}' cannot match the 'o' in "Bob", but can match the two o's in "food" 8.
: n is one. Non-negative integers. Matches at least n times. For example, 'o{2,}' cannot match 'o' in "Bob", but it is equivalent to "o{1,}". 'o+'. 'o{0,}' is equivalent to 'o*' 9.
: m and n are both non-negative integers, where n < ;= m. Matches at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood", which is equivalent to 'o? '. Please note that there cannot be a space between the comma and the two numbers 10.
: When this character is followed by any other limiter (*, +, ?, {n }, {n,}, {n,m}), the matching mode is non-greedy. The non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much as possible. String. For example, for the string "oooo", 'o+?' will match a single 'o', while 'o+' will match all 'o's. ·: Matches except Any single character except "n". To match any character including 'n', use a pattern like '[.n]'
12.(pattern): Match pattern and get this match. The matches obtained can be obtained from the generated Matches collection, using the SubMatches collection in VBScript or the $0…$9 properties in JScript. To match parentheses characters, use '(' or ')'.
13.(?:pattern)
: matches the pattern but does not get the matching result, which means this is a non-getting match and is not stored for later use. This is useful when using the "or" character (|) to combine parts of a pattern. For example, 'industr(?:y|ies) is a shorter expression than 'industry|industries'.
14.(?=pattern)
: Forward lookup, match the search string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, 'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch.
15.(?!pattern)
: Negative lookup, matches the search string at the beginning of any string that does not match pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, 'Windows (?!95|98|NT|2000)' can match "Windows" in "Windows 3.1", but not "Windows" in "Windows 2000". Prefetching does not consume characters, that is, after a match occurs, the search for the next match starts immediately after the last match, rather than starting after the characters containing the prefetch
16.x| y
: matches x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food".
17.[xyz]
: Character collection. Matches any one of the characters contained. For example, '[abc]' matches 'a' in "plain".
18.[^xyz]
: Negative value character set. Matches any character not included. For example, '1' matches 'p' in "plain".
19.[a-z]
: character range. Matches any character within the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'.
20.[^a-z]
: Negative character range. Matches any character not within the specified range. For example, '2' matches any character that is not in the range 'a' through 'z'.
21.b
: Match a word boundary, which refers to the position between the word and the space. For example, 'erb' matches 'er' in "never" but not "er" in "verb".
22.B
: Match non-word boundaries. 'erB' matches 'er' in "verb" but not in "never".
23.cx
: Matches the control character specified by x. For example, cM matches a Control-M or carriage return character. The value of x must be one of A-Z or a-z. Otherwise, c is treated as a literal 'c' character.
24.d
: Matches a numeric character. Equivalent to [0-9].
25.D
: Matches a non-numeric character. Equivalent to 3.
26.f
: Matches a form feed. Equivalent to x0c and cL.
27.\n
:符合一個換行符號。等價於 x0a 和 cJ。
28.\r
#:符合一個回車符。等價於 x0d 和 cM。
29.\s
#:符合任何空白字符,包括空格、製表符、換頁符等等。等價於 [ fnrtv]。
30.\S
#:符合任何非空白字元。等價於 4。
31.\t
#:符合一個製表符。等價於 x09 和 cI。
32.\v
#:符合一個垂直製表符。等價於 x0b 和 cK。
33.\w
#:符合包含下劃線的任何單字字元。等價於'[A-Za-z0-9_]'。
34.\W
#:符合任何非單字字元。等價於 '5'。
35.\xn
#:符合 n,其中 n 為十六進位轉義值。十六進制轉義值必須為確定的兩個數字長。例如,'x41' 符合 "A"。 'x041' 則等價於 'x04' & "1"。正規表示式中可以使用 ASCII 編碼。
36.\num
#:符合 num,其中 num 為正整數。對所獲取的匹配的引用。例如,'(.)1' 符合兩個連續的相同字元。
37.\n
#:標識一個八進位轉義值或一個向後引用。如果 n 之前至少 n 個獲取的子表達式,則 n 為向後引用。否則,如果 n 為八進制數字 (0-7),則 n 為一個八進制轉義值。
38.\nm
:標識一個八進位轉義值或一個向後引用。如果 nm 之前至少有 nm 個獲得子表達式,則 nm 為向後引用。如果 nm 之前至少有 n 個獲取,則 n 為一個後接文字 m 的向後引用。如果前面的條件都不滿足,若 n 和 m 均為八進位數字 (0-7),則 nm 將符合八進位轉義值 nm。
39.\nml
:如果n 為八進位數字(0-3),且m 和l均為八進制數字(0-7),則符合八進制轉義值nml。
ECMAScript透過RegExp
類型支援正規表示式,如下:
var expression = /pattern/flags;
其中的模式(pattern)
部分可以是任何簡單或複雜的正規表示式,可以包含字元類別、限定符、分組、向前查找以及反向引用。每個正規表示式可帶有一個或多個標註(flags)
,用以標明正規表示式的行為。有三個一下標誌:
g
:表示全域模式,即模式將被套用到所有字串,而非在發現第一個匹配項時立即停止。
i
:表示不區分大小寫模式。
m
:表示多行模式,即到達一行文字結尾時還在繼續尋找下一行中是否存在於模式匹配的項。
例如:匹配第一个bat或者cat,不区分大小写
var pattern = /[bc]at/i;
它接收两个参数:一个是要匹配的字符串模式,另一个是可选的标志字符串。可以使用字面量定义的任何表达式,都可以使用构造函数来定义,还是以上面的例子为例:
var pattern = new RegExp("[bc]at","i");
注意:
RegExp
构造函数模式参数时字符串,所以再某些情况下要对字符进项双重转义。所有元字符都必须双重转义,如字面量模式为/\[bc\]at/
,那么等价的字符串为"/\\[bc\\]at/"
例子:
var re = null, i; for(i=0; i < 10; i++){ re = /cat/g; console.log(re.test("catastrophe")); } for(i=0; i < 10; i++){ re = new RegExp("cat","g"); console.log(re.test("catastrophe")); }
打印结果都为10个true
该方法是专门为捕获组而设计的,其接受一个参数,即要应用模式的字符串,然后返回包含第一个匹配项信息的数组;或者在没有匹配项的情况下返回null
。返回的数组虽然是Array
的实例,但是包含两个额外的属性:index
和input
。其中index
表示匹配项在字符串中的位置,而input
表示应用字符串表达式的字符串。
例:
var text = "mom and dad and baby";var pattern = /mom( and dad( and baby)?)?/gi;var matches = pattern.exec(text); console.log(matches.index); //0console.log(matches.input); //mom and dad and babyconsole.log(matches[0]); //mom and dad and babyconsole.log(matches[1]); //and dad and babyconsole.log(matches[2]); //and baby
对于exec()
方法而言,即使在模式中设置了全局标志g
,它每次也只是返回一个匹配项。在不设置全局标志的情况下,在同一个字符串上多次调用exec()
方法将始终返回第一个匹配项的信息。而在设置全局标志的情况下,每次调用exec()
则都会在字符串中继续查找新匹配项,如下例子:
var text = "cat, bat, sat, fat";var pattern1 = /.at/;var matches = pattern1.exec(text); console.log(matches.index); //0console.log(matches[0]); //catconsole.log(pattern1.lastIndex); //0matches = pattern1.exec(text); console.log(matches.index); //0console.log(matches[0]); //catconsole.log(pattern1.lastIndex); //0var pattern2 = /.at/g;var matches = pattern2.exec(text); console.log(matches.index); //0console.log(matches[0]); //catconsole.log(pattern2.lastIndex); //3var matches = pattern2.exec(text); console.log(matches.index); //5console.log(matches[0]); //batconsole.log(pattern2.lastIndex); //8
注意:
IE
的JavaScript
实现lastIndex
属性上存在偏差,即使在非全局模式下,lastIndex
属性每次也都在变化。
正则表达式常用方法test(),它接受一个字符串参数。在模式与该参数匹配的情况下返回true
,否则返回false
。
例如:
var text ="000-00-0000";var pattern = /\d{3}-\d{2}-\d{4}/;if(pattern.test(text)){ console.log('the pattern was matched.'); }
获取正则匹配到的结果,以数组的形式返回例如:
"186a619b28".match(/\d+/g); // ["186","619","28"]
replace 本身是JavaScript字符串对象的一个方法,它允许接收两个参数:
replace([RegExp|String],[String|Function])
第1个参数可以是一个普通的字符串或是一个正则表达式.
第2个参数可以是一个普通的字符串或是一个回调函数.
如果第2个参数是回调函数,每匹配到一个结果就回调一次,每次回调都会传递以下参数:
result: 本次匹配到的结果
$1,...$9: 正则表达式中有几个(),就会传递几个参数,$1~$9分别代表本次匹配中每个()提取的结果,最多9个
offset:记录本次匹配的开始位置
source:接受匹配的原始字符串
以下是replace和JS正则搭配使用的几个常见经典案例:
(1)实现字符串的trim函数,去除字符串两边的空格
String.prototype.trim = function(){ //方式一:将匹配到的每一个结果都用""替换 return this.replace(/(^\s+)|(\s+$)/g,function(){ return ""; }); //方式二:和方式一的原理相同 return this.replace(/(^\s+)|(\s+$)/g,''); };
^s+ 表示以空格开头的连续空白字符,s+$ 表示以空格结尾的连续空白字符,加上() 就是将匹配到的结果提取出来,由于是 | 的关系,因此这个表达式最多会match到两个结果集,然后执行两次替换:
String.prototype.trim = function(){ /** * @param rs:匹配结果 * @param $1:第1个()提取结果 * @param $2:第2个()提取结果 * @param offset:匹配开始位置 * @param source:原始字符串 */ this.replace(/(^\s+)|(\s+$)/g,function(rs,$1,$2,offset,source){ //arguments中的每个元素对应一个参数 console.log(arguments); }); }; " abcd ".trim();
输出结果:
[" ", " ", undefined, 0, " abcd "] //第1次匹配结果 [" ", undefined, " ", 5, " abcd "] //第2次匹配结果
(2)提取浏览器url中的参数名和参数值,生成一个key/value的对象
function getUrlParamObj(){ var obj = {}; //获取url的参数部分 var params = window.location.search.substr(1); //[^&=]+ 表示不含&或=的连续字符,加上()就是提取对应字符串 params.replace(/([^&=]+)=([^&=]*)/gi,function(rs,$1,$2){ obj[$1] = $2; }); return obj; }
/([^&=]+)=([^&=]*)/gi
每次匹配到的都是一个完整key/value
,形如 <span style="color: #ff0000;">xxxx=xxx</span>
, 每当匹配到一个这样的结果时就执行回调,并传递匹配到的key
和value
,对应到$1
和$2
(3)在字符串指定位置插入新字符串
String.prototype.insetAt = function(str,offset){ //使用RegExp()构造函数创建正则表达式 var regx = new RegExp("(.{"+offset+"})"); return this.replace(regx,"$1"+str); }; "abcd".insetAt('xyz',2); //在b和c之间插入xyz//结果 "abxyzcd"
当offset=2
时,正则表达式为:(^.{2})
.表示除\n
之外的任意字符,后面加{2}
就是匹配以数字或字母组成的前两个连续字符,加()
就会将匹配到的结果提取出来,然后通过replace
将匹配到的结果替换为新的字符串,形如:结果=结果+str
(4) 将手机号12988886666转化成129 8888 6666
function telFormat(tel){ tel = String(tel); //方式一 return tel.replace(/(\d{3})(\d{4})(\d{4})/,function (rs,$1,$2,$3){ return $1+" "+$2+" "+$3 }); //方式二 return tel.replace(/(\d{3})(\d{4})(\d{4})/,"$1 $2 $3"); }
(\d{3}\d{4}\d{4})
可以匹配完整的手机号,并分别提取前3位、4-7位和8-11位,"$1 $2 $3"
是在三个结果集中间加空格组成新的字符串,然后替换完整的手机号。
匹配第一个bat或者cat,不区分大小写: <span style="color: #ff0000;">/[bc]at/i</span>
或者 new RegExp("[bc]at","i")
;
匹配所有以"at"结尾的3个字符组合,不区分大小写:/.at/gi
;
只能输入数字:^[0-9]*$
;
只能输入n位的数字:^\d{n}$
只能输入至少n位的数字:^\d{n,}$
只能输入m~n位的数字:^\d{m,n}$
只能输入零和非零开头的数字:^(0|[1-9][0-9]*)$
只能输入有两位小数的正实数:^[0-9]+(.[0-9]{2})?$
只能输入有1~3位小数的正实数:^[0-9]+(.[0-9]{1,3})?$
只能输入非零的正整数:^\+?[1-9][0-9]*$
只能输入长度为3的字符:^.{3}$
只能輸入由26個英文字母組成的字串:^[A-Za-z]+$
只能輸入由數字和26個英文字母組成的字串:^[A-Za-z0-9]+$
只能輸入由數字、26個英文字母或底線組成的字串:^\w+$
驗證使用者密碼:以字母開頭,長度在6~18之間,只能包含字元、數字和底線:#[a-zA-Z]\w{5,17}$
驗證是否含有^%&',;=?$"等字元:[^%&',;=?$\x22]+
只能輸入漢字:^[\u4e00-\u9fa5]{0,}$
驗證Email位址:^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+( [-.]\w+)*$
驗證InternetURL:#^http://([\w-]+\. )+[\w-]+(/[\w-./?%&=]*)?$
#驗證身分證號(15位或18位元數字):^\d{15}|\d{18}$
驗證IP位址: ^((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[ 0-5]|[01]?\d\d?)$
# 符合兩個兩個重疊出現的字元例如,"aabbc11asd", 返回結果為aa bb 11三組match:(\w)\1
#符合成對的HTML標籤:<(?<tag>[^\s>]+)[^>]*>.*</\k<tag>>
############################################################ ######符合1-58之間的數字:###/^([1-9]|[1-5][0-8])$/########### ##符合 -90至90之間的整數(包括-90和90):###^(-?[1-8][0-9]|-?[1-9]|-?90|0 )$#######
The above is the detailed content of Detailed explanation of js regular expressions. For more information, please follow other related articles on the PHP Chinese website!