Home > Java > javaTutorial > body text

How to use Java to crawl mailboxes from the Internet

黄舟
Release: 2017-10-10 10:18:38
Original
2018 people have browsed it

这篇文章介绍了Java 从互联网上爬邮箱的有关内容,主要是一个代码示例,小编觉得挺不错的,这里给大家分享下,需要的朋友可以了解。

网页爬虫:其实就是一个程序用于在互联网中获取符合指定规则的数据。


package day05; 
import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStreamReader; 
import java.net.URL; 
import java.util.ArrayList; 
import java.util.List; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
public class SpiderDemo { 
  public static void main(String[] args) throws IOException { 
    List<String> list = getMailByWeb(); 
    for (String mail : list) { 
      System.out.println(mail); 
    } 
  } 
  public static List<String> getMailByWeb() throws IOException { 
    URL url = new URL("http://www.itheima.com/aboutt/1376.html"); 
    BufferedReader input = new BufferedReader(new InputStreamReader(url.openStream())); 
    String regex = "\\w+@\\w+(\\.\\w+)+"; 
    Pattern p = Pattern.compile(regex); 
    List<String> list = new ArrayList<String>(); 
    String line = null; 
    while ((line = input.readLine()) != null) { 
      Matcher m = p.matcher(line); 
      while (m.find()) { 
        list.add(m.group()); 
      } 
    } 
    return list; 
  } 
}
Copy after login

总结

 Jsoup解析html方法,通常被人称之为爬虫技术。(个人认为可能是返回的数据,只有一小部分是我们需要的,造成了数据的冗余,和网络延迟)。

The above is the detailed content of How to use Java to crawl mailboxes from the Internet. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template