从Java1.4起,Java核心API就引入了java.util.regex程序包,它是一种有价值的基础工具,可以用于很多类型的文本处理, 如匹配,搜索,提取和分析结构化内容.
java.util.regex是一个用正则表达式所订制的模式来对字符串进行匹配工作的类库包。它包括两个类:Pattern和Matcher.
Pattern是一个正则表达式经编译后的表现模式。 在java中,通过适当命名的Pattern类可以容易确定String是否匹配某种模式.模式可以象匹配某个特定的String那样简单,也可以很复 杂,需要采用分组和字符类,如空白,数字,字母或控制符.因为Java字符串基于统一字符编码(Unicode),正则表达式也适用于国际化的应用程序.
Pattern类的方法简述
方法 |
说明 |
static Pettern compile(String regex,int flag) |
编译模式,参数regex表示输入的正则表达式,flag表示模式类型(Pattern.CASE_INSENSITIVE 表示不区分大小写)
|
Matcher match(CharSequence input) |
获取匹配器,input时输入的待处理的字符串
|
static boolean matches(String regex, CharSequence input) |
快速的匹配调用,直接根据输入的模式regex匹配input
|
String[] split(CharSequence input,int limit) |
分隔字符串input,limit参数可以限制分隔的次数
|
Matcher 一个Matcher对象是一个状态机器,它依据Pattern对象做为匹配模式对字符串展开匹配检查。首先一个Pattern实例订制了一个所用语法与 PERL的类似的正则表达式经编译后的模式,然后一个Matcher实例在这个给定的Pattern实例的模式控制下进行字符串的匹配工作。
Matcher类的方法简述
方法 |
说明 |
boolean matches() |
对整个输入字符串进行模式匹配.
|
boolean lookingAt() |
从输入字符串的开始处进行模式匹配 |
boolean find(int start) |
从start处开始匹配模式
|
int groupCount() |
返回匹配后的分组数目 |
String replaceAll(String replacement) |
用给定的replacement全部替代匹配的部分
|
String repalceFirst(String replacement) |
用给定的replacement替代第一次匹配的部分
|
Matcher appendReplacement(StringBuffer sb,String replacement) |
根据模式用replacement替换相应内容,并将匹配的结果添加到sb当前位置之后
|
StringBuffer appendTail(StringBuffer sb) |
将输入序列中匹配之后的末尾字串添加到sb当前位置之后.
|
Common wildcards in regular expressions:
For single string comparisons, there is no advantage in using regular expressions. The real power of Regex is that it includes character classes and quantifiers ( *, ,?) on more complex patterns.
character classes include:
d numbers
D non-numbers
w single-word characters (0-9,A-Z,a-z)
W non-single-word characters
s blank (space character, newline, carriage return, tab)
S non-whitespace
[] A custom character class created from a list of characters within square brackets
. Matches any single character below
The characters will be used to control the process of applying a subpattern to the number of matches.
? Repeat the previous subpattern 0 times to once
* Repeat the previous subpattern 0 or more times
Repeat the previous Subpatterns from one to multiple times
The following is the example part:
Example 1: The regular expression is The simplest pattern can accurately match a given String. The pattern is equivalent to the text to be matched. The static Pattern.matches method is used to compare whether a String matches a given pattern. The routine is as follows:
String data="java";
boolean result=Pattern.matches ("java",data);
Example 2:
String[] dataArr = { "moon", "mon", "moon", "mono" };
for (String str : dataArr) {
String patternStr="m(o )n";
boolean result = Pattern.matches(patternStr, str);
if (result) {
System.out.println("String " str " Match pattern "patternStr "success");
}
else{
System.out.println("string" str "match pattern" patternStr "failed");
}
}
The pattern is "m(o )n", which means that the o in the middle of mn can be repeated one or more times, so moon, mon, moon can match successfully, and mono has one more after n. o, does not match the pattern.
Note:
means one or more times; ? means 0 or more times; * means 0 or more times.
Example 3:
String[] dataArr = { "ban", "ben", "bin", "bon" ,"bun","byn","baen"};
for (String str : dataArr) {
String patternStr="b[aeiou]n";
boolean result = Pattern.matches(patternStr, str);
if (result) {
System.out.println("string" str "match pattern" patternStr "success");
}
else{
System.out.println("string" str "match pattern" patternStr "failed");
}
}
Note: Square The only single characters allowed in brackets are specified by the pattern "b[aeiou]n". Only those starting with b and ending with n and with any one of a, e, i, o, and u in the middle can be matched, so the first five characters in the array can be matched, but the last two elements cannot be matched.
The square brackets [] indicate that only the characters specified in them can be matched.
Example 4:
String[] dataArr = { "been", "bean", "boon", "buin" ," bynn"};
for (String str : dataArr) {
String patternStr="b(ee|ea|oo)n";
boolean result = Pattern.matches(patternStr, str);
if (result) {
System.out.println("string" str "match pattern" patternStr "success");
}
else{
System.out.println("character String " str "matching pattern" patternStr "failed");
}
}
If you need to match multiple characters, then [] cannot be used, here we can use () plus | instead, () represents a group, | represents an or relationship, the pattern b(ee|ea|oo)n can match been, bean, boon, etc.
So the first three can match , and the latter two cannot.
Example 5:
String[] dataArr = { "1", "10", "101", "1010" ,"100 "};
for (String str : dataArr) {
String patternStr= "d ";
boolean result = Pattern.matches(patternStr, str);
if (result) {
System.out.println("string" str "match pattern" patternStr "success") ;
}
else{
System.out.println("string" str "match pattern" patternStr "failed");
}
}
Note: As you can know from the above, d represents a number, and it represents one or more times, so the pattern d represents one or more digits.
So the first four can be matched, and the last one is because of the number. It is a non-numeric character and cannot be matched.
[/code]
Example 6:
String[] dataArr = { "a100", "b20", "c30", "df10000" ,"gh0t"};
for (String str : dataArr) {
String patternStr="w d ";
boolean result = Pattern.matches(patternStr, str);
if (result) {
System.out.println("string" str "Matching pattern" patternStr "Success");
}
else{
System.out.println("String" str "Matching pattern" patternStr "Failed");
}
}
Mode w d represents a string starting with multiple single-word characters and ending with multiple numbers, so the first four can be matched, but the last one cannot be matched because there are single-word characters after the number. .
Example 7:
String str="salary, position name; age and gender";
String[] dataArr =str.split("[,s;]");
for (String strTmp : dataArr) {
System .out.println(strTmp);
}
The split function of String class supports regular expressions. In the above example, the pattern can match one of ",", a single space, ";" , the split function can use any of them as a separator to split a string into a string array.
Example 8:
String str="December 11, 2007";
Pattern p = Pattern.compile("[Year and Month Japan]");
String[] dataArr =p.split(str);
for (String strTmp : dataArr) {
System.out.println(strTmp);
}
Pattern is a compiled expression pattern of regular expressions. Its split method can effectively split strings.
Note that it is different from String.split() in usage.
Example 9:
String str=" 10 yuan 1000 RMB 10000 yuan 100000RMB";
str=str.replaceAll("(d )(yuan|RMB|RMB)", "¥");
System.out.println(str);
In the above example, the pattern "(d)(yuan|renminbi|RMB)" is divided into two groups according to brackets. The first group d matches single or multiple numbers, and the second group matches yuan and renminbi. Any one in RMB, the replacement part means that the matching part of the first group remains unchanged, and the remaining groups are replaced with ¥.
The replaced str is¥10 ¥1000 ¥10000 ¥100000
Example 10:
Pattern p = Pattern.compile("m (o )n",Pattern.CASE_INSENSITIVE);
// Use the matcher() method of the Pattern class to generate a Matcher object
Matcher m = p.matcher("moon mooon Mon mooooon Mooon");
StringBuffer sb = new StringBuffer();
// Use the find() method to find the first matching object
boolean result = m.find();
// Use a loop to find the content of the pattern match Replace it and add the content to sb
while (result) {
m.appendReplacement(sb, "moon");
result = m.find();
}
// Finally, call the appendTail() method to add the remaining string after the last match to sb;
m.appendTail(sb);
System.out.println("The content after replacement is " sb.toString ());
Example 11: In addition to indicating one or more times, * indicating 0 or more times, ? indicating 0 or once, You can also use {} to specify the precise number of occurrences. X{2,5} means that X appears at least 2 times and at most 5 times; 5} means that 🎜>
String[] dataArr = { "google", "gooogle", "gooooogle", "gooooogle", "ggle"};
for (String str : dataArr) {
}
Example 12: -means from.. to..., such as [a-e] is equivalent to [abcde]
String[] dataArr = { "Tan", "Tbn", "Tcn", "Ton", "Twn"};
for (String str : dataArr) {
String regex = "T[a-c]n";
boolean result = Pattern.matches(regex, str);
if (result) {
System .out.println("String" str "Match pattern" regex "Success");
} else {
System.out.println("String" str "Match pattern" regex "Failure");
}
}
Example 13: Case-insensitive matching. Regular expressions are case-sensitive by default, using Pattern .CASE_INSENSITIVE does not distinguish between upper and lower case.
String patternStr= "ab";
Pattern pattern=Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
String[] dataArr = { "ab", "Ab", "AB"};
for (String str : dataArr) {
Matcher matcher=pattern.matcher(str);
if(matcher.find()){
System.out.println("string" str "match pattern" patternStr "success ");
}
}
Example 14: Use regular expressions to split strings.
Note that complex patterns should be written in front, otherwise simple patterns will be matched first.
String input="Position=GM Salary=50000, Name=Professional Manager; Gender=Male Age=45 ";
String patternStr="(s*,s*)|(s*;s*)|(s ) ";
Pattern pattern=Pattern.compile(patternStr);
String[] dataArr=pattern.split(input);
for (String str : dataArr) {
System.out.println( str);
}
Example 15: Parse the text in the regular expression, corresponding to the first group1 enclosed in parentheses.
String regex="<(w )>(w ) >";
Pattern pattern=Pattern.compile(regex);
String input="Bill50000
GMMatcher matcher=pattern.matcher(input);
while(matcher.find()){
System.out.println(matcher.group(2));
}
Example 16: Capitalize the word part of a string that mixes words and numbers.
String regex="([a-zA-Z] [0-9] )";
Pattern pattern=Pattern.compile(regex) ;
String input="age45 salary500000 50000 title";
Matcher matcher=pattern.matcher(input);
StringBuffer sb=new StringBuffer();
while(matcher.find()){
String replacement=matcher.group(1).toUpperCase();
matcher.appendReplacement(sb, replacement);
}
matcher.appendTail(sb);
System.out. println("The replaced string is " sb.toString());