Home > Java > javaTutorial > body text

Detailed explanation of Java's function of reading files and obtaining phone numbers (based on regular expressions)

黄舟
Release: 2017-09-06 14:16:57
Original
1976 people have browsed it

This article mainly introduces Java file reading and the function of obtaining phone numbers based on regular expressions. It analyzes in detail the relevant syntax of regular matching operations and the principles and implementation techniques of phone number matching in the form of examples. Friends who need it can Refer to the following

The example of this article describes the function of Java reading files and obtaining phone numbers based on regular expressions. Share it with everyone for your reference, the details are as follows:

1. Regular expression

Regular expression, also known as regular expression, regular expression Representation (English: Regular Expression, often abbreviated as regex, regexp or RE in code), a concept in computer science. Regular expressions use a single string to describe and match a series of strings that match a certain syntax rule. In many text editors, regular expressions are often used to retrieve and replace text that matches a certain pattern.

Analysis of the meaning of some special constructed regular expressions used:

##(?

Reverse negative pre-check, similar to forward negative pre-check , just in the opposite direction. For example, "(?

Use of quantifiers

?

When this character immediately follows any of the other qualifiers (*,+,?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. Non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible. For example, for the string "oooo", "o+?" will match a single "o", while "o+" will match all "o"s.

. Dot

matches any single character except "\r\n". To match any character including "\r\n", use a pattern like "[\s\S]".

(pattern)

Match pattern and get this match. The obtained matches can be obtained from the generated Matches collection, using the SubMatches collection in VBScript and the $0...$9 attributes in JScript. To match parentheses characters, use "".

(?:pattern)

Matches the pattern but does not obtain the matching result, which means this is a non- Matches are obtained without storing them for later use. This is useful when combining parts of a pattern using the or character "(|)". For example, "industr(?:y|ies)" is a simpler expression than "industry|industries".

(?=pattern)

Positive positive lookup, at the beginning of any string matching pattern Match the search string. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, "Windows(?=95|98|NT|2000)" can match "Windows" in "Windows2000", but cannot match "Windows" in "Windows3.1". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch.

(?!pattern)

Forward negative lookup, starting at the beginning of any string that does not match pattern matches the search string. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, "Windows(?!95|98|NT|2000)" can match "Windows" in "Windows3.1", but cannot match "Windows" in "Windows2000".

##(?<=pattern)

Reverse positive pre-check is similar to forward positive pre-check, but in the direction on the contrary. For example, "(?<=95|98|NT|2000)Windows" can match "Windows" in "2000Windows", but cannot match "Windows" in "3.1Windows".

##X

2、手机号码

组成

国家区域号-手机号码

手机号码格式比较固定,无非是13x xxxx xxxx或者15x xxxx xxxx再或者18x xxxx xxxx的格式。座机就比较麻烦,比如长途区号变长(3位或者4位)电话号码变长(7位或者8位)有些还需要输入分机号。

通常可以看到解决这个复杂问题的解决方案是手机号和座机号分开。座机号拆分成三段,区号,电话号码+分机号。但是为了表单看起来清爽,设计的时候给了一个“万能”的输入框,给用户输入电话号码或者手机号码。

在这样的一个需求的大前提下,用复杂的正则表达式解决验证的问题是一种快速的解决方案。

首先搞定最容易的手机号码

因为目前开放的号段是130-139, 150-159, 185-189, 180

只考虑移动电话(手机)号码的可以使用下面方法


public static void main(String[] args) { 
String text = "13522158842;托尔斯泰;test2;13000002222;8613111113313"; 
Pattern pattern = Pattern.compile("(?<!\\d)(?:(?:1[358]\\d{9})|(?:861[358]\\d{9}))(?!\\d)"); 
Matcher matcher = pattern.matcher(text); 
 StringBuffer bf = new StringBuffer(64); 
 while (matcher.find()) { 
 bf.append(matcher.group()).append(","); 
 } 
 int len = bf.length(); 
 if (len > 0) { 
 bf.deleteCharAt(len - 1); 
 } 
System.out.println(bf.toString()); 
}
Copy after login

只是手机号码可以匹配可以给出下面的匹配正则表达式:

(?:((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})

当我们 加上国家区域号 (86)或者(+86)或者86-或者直接是86,可以使用下面的正则表达式:

"(?:(\\(\\+?86\\))((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})|" +
"(?:86-?((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})|" +
"(?:((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})"

注意 :为了最长得匹配电话号码,需要写成三句,并且相对长的需要放在前面,否则匹配到了之后,后面的就不会匹配了。

3、座机号码

组成:

国家区域号(+86等)-区号-固定电话号码-分机号

三位 区号 的部分

010, 021-029,852(香港)

因为采用三位区号的地方都是8位电话号码,因此可以写成

(010|021|022|023|024|025|026|027|028|029|852)\d{8}

当然不会这么简单,有些人习惯(010) xxxxxxxx的格式,我们也要支持一把,把以上表达式升级成

再看4位区号的城市

这里简单判断了不可能存在0111或者0222的区号,以及电话号码是7位或者8位。

最后是分机号(1-4位的数字)

(?<分机号>\D?\d{1,4})?

以上拼装起来就是:

"(?:(\\(\\+?86\\))(0[0-9]{2,3}\\-?)?([2-9][0-9]{6,7})+(\\-[0-9]{1,4})?)|" +
"(?:(86-?)?(0[0-9]{2,3}\\-?)?([2-9][0-9]{6,7})+(\\-[0-9]{1,4})?)"

4、编码实现

实现功能:读取文件,将其中的电话号码存入一个Set返回。

方法介绍:

find():尝试查找与该模式匹配的输入序列的下一个子序列。
group():返回由以前匹配操作所匹配的输入子序列。

①、从一个字符串中获取出其中的电话号码


import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
 * 从字符串中截取出电话号码
 * @author zcr
 *
 */
public class CheckIfIsPhoneNumber 
{
 /**
 * 获得电话号码的正则表达式:包括固定电话和移动电话
 * 符合规则的号码:
 * 1》、移动电话
 *  86+‘-&#39;+11位电话号码
 *  86+11位正常的电话号码
 *  11位正常电话号码a
 *  (+86) + 11位电话号码
 *  (86) + 11位电话号码
 * 2》、固定电话
 *  区号 + ‘-&#39; + 固定电话 + ‘-&#39; + 分机号
 *  区号 + ‘-&#39; + 固定电话 
 *  区号 + 固定电话
 * @return 电话号码的正则表达式
 */
 public static String isPhoneRegexp()
 {
 String regexp = "";
 //能满足最长匹配,但无法完成国家区域号和电话号码之间有空格的情况
 String mobilePhoneRegexp = "(?:(\\(\\+?86\\))((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})|" + 
  "(?:86-?((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})|" +
  "(?:((13[0-9]{1})|(15[0-9]{1})|(18[0,5-9]{1}))+\\d{8})";
 // System.out.println("regexp = " + mobilePhoneRegexp);
 //固定电话正则表达式
 String landlinePhoneRegexp = "(?:(\\(\\+?86\\))(0[0-9]{2,3}\\-?)?([2-9][0-9]{6,7})+(\\-[0-9]{1,4})?)|" +
  "(?:(86-?)?(0[0-9]{2,3}\\-?)?([2-9][0-9]{6,7})+(\\-[0-9]{1,4})?)"; 
 regexp += "(?:" + mobilePhoneRegexp + "|" + landlinePhoneRegexp +")"; 
 return regexp;
 }
 /**
 * 从dataStr中获取出所有的电话号码(固话和移动电话),将其放入Set
 * @param dataStr 待查找的字符串
 * @param phoneSet dataStr中的电话号码
 */
 public static void getPhoneNumFromStrIntoSet(String dataStr,Set<String> phoneSet)
 {
 //获得固定电话和移动电话的正则表达式
 String regexp = isPhoneRegexp();
 System.out.println("Regexp = " + regexp);
 Pattern pattern = Pattern.compile(regexp); 
 Matcher matcher = pattern.matcher(dataStr); 
 //找与该模式匹配的输入序列的下一个子序列
 while (matcher.find()) 
 { 
  //获取到之前查找到的字符串,并将其添加入set中
  phoneSet.add(matcher.group());
 } 
 //System.out.println(phoneSet);
 }
}
Copy after login

②、读取文件并调用电话号码获取

实现方式:根据文件路径获得文件后,一行行读取,去获取里面的电话号码


import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
/**
 * 读取文件操作
 * 
 * @author zcr
 * 
 */
public class ImportFile
{
 /**
 * 读取文件,将文件中的电话号码读取出来,保存在Set中。
 * @param filePath 文件的绝对路径
 * @return 文件中包含的电话号码
 */
 public static Set<String> getPhoneNumFromFile(String filePath)
 {
 Set<String> phoneSet = new HashSet<String>();
 try
 {
 String encoding = "UTF-8";
 File file = new File(filePath);
 if (file.isFile() && file.exists())
 { // 判断文件是否存在
 InputStreamReader read = new InputStreamReader(
  new FileInputStream(file), encoding);// 考虑到编码格
 BufferedReader bufferedReader = new BufferedReader(read);
 String lineTxt = null;
 while ((lineTxt = bufferedReader.readLine()) != null)
 {
  //读取文件中的一行,将其中的电话号码添加到phoneSet中
  CheckIfIsPhoneNumber.getPhoneNumFromStrIntoSet(lineTxt, phoneSet);
 }
 read.close();
 }
 else
 {
 System.out.println("找不到指定的文件");
 }
 }
 catch (Exception e)
 {
 System.out.println("读取文件内容出错");
 e.printStackTrace();
 }
 return phoneSet;
 }
}
Copy after login

③、测试


public static void main(String argv[])
{
 String filePath = "F:\\three.txt"; 
 Set<String> phoneSet = getPhoneNumFromFile(filePath);
 System.out.println("电话集合:" + phoneSet);
}
Copy after login

文件中数据:

结果:

电话集合:[86132221, (86)13222144332, 86-13222144332, 32434343, (+86)13222144332, 13888888888]

X { n }? X, exactly n times
##X { n ,}? X, at least n times
X { n , m}? , at least n times, but no more than m times

The above is the detailed content of Detailed explanation of Java's function of reading files and obtaining phone numbers (based on regular expressions). For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template