


Write Java with zero foundation and practice Zhihu crawler first with Baidu homepage (2)
Ah, wrong, small example.
// 定义一个样式模板,此中使用正则表达式,括号中是要抓的内容 // 相当于埋好了陷阱匹配的地方就会掉下去 Pattern pattern = Pattern.compile("href=\"(.+?)\""); // 定义一个matcher用来做匹配 Matcher matcher = pattern.matcher("<a href=\"index.html\">我的主页</a>"); // 如果找到了 if (matcher.find()) { // 打印出结果 System.out.println(matcher.group(1)); }
Run result:
index.html
Yes, this is our first regular code.
The link for grabbing pictures from such an application must be at your fingertips.
We encapsulate the regular matching into a function, and then modify the code as follows:
import java.io.*; import java.net.*; import java.util.regex.*; public class Main { static String SendGet(String url) { // 定义一个字符串用来存储网页内容 String result = ""; // 定义一个缓冲字符输入流 BufferedReader in = null; try { // 将string转成url对象 URL realUrl = new URL(url); // 初始化一个链接到那个url的连接 URLConnection connection = realUrl.openConnection(); // 开始实际的连接 connection.connect(); // 初始化 BufferedReader输入流来读取URL的响应 in = new BufferedReader(new InputStreamReader( connection.getInputStream())); // 用来临时存储抓取到的每一行的数据 String line; while ((line = in.readLine()) != null) { // 遍历抓取到的每一行并将其存储到result里面 result += line; } } catch (Exception e) { System.out.println("发送GET请求出现异常!" + e); e.printStackTrace(); } // 使用finally来关闭输入流 finally { try { if (in != null) { in.close(); } } catch (Exception e2) { e2.printStackTrace(); } } return result; } static String RegexString(String targetStr, String patternStr) { // 定义一个样式模板,此中使用正则表达式,括号中是要抓的内容 // 相当于埋好了陷阱匹配的地方就会掉下去 Pattern pattern = Pattern.compile(patternStr); // 定义一个matcher用来做匹配 Matcher matcher = pattern.matcher(targetStr); // 如果找到了 if (matcher.find()) { // 打印出结果 return matcher.group(1); } return ""; } public static void main(String[] args) { // 定义即将访问的链接 String url = "http://www.baidu.com"; // 访问链接并获取页面内容 String result = SendGet(url); // 使用正则匹配图片的src内容 String imgSrc = RegexString(result, "即将的正则语法"); // 打印结果 System.out.println(imgSrc); } }
Okay, now everything is ready, only a regular grammar is left!
So what regular statement is more appropriate?
We found that as long as we grab the string src="xxxxxx", we can grab the entire src link,
So a simple regular statement: src="(.+?)"
The above is zero To learn the basics of writing Java crawlers, first practice with the Baidu homepage (2). For more related content, please pay attention to the PHP Chinese website (www.php.cn)!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Guide to Square Root in Java. Here we discuss how Square Root works in Java with example and its code implementation respectively.

Guide to Perfect Number in Java. Here we discuss the Definition, How to check Perfect number in Java?, examples with code implementation.

Guide to Random Number Generator in Java. Here we discuss Functions in Java with examples and two different Generators with ther examples.

Guide to Weka in Java. Here we discuss the Introduction, how to use weka java, the type of platform, and advantages with examples.

Guide to the Armstrong Number in Java. Here we discuss an introduction to Armstrong's number in java along with some of the code.

Guide to Smith Number in Java. Here we discuss the Definition, How to check smith number in Java? example with code implementation.

In this article, we have kept the most asked Java Spring Interview Questions with their detailed answers. So that you can crack the interview.

Java 8 introduces the Stream API, providing a powerful and expressive way to process data collections. However, a common question when using Stream is: How to break or return from a forEach operation? Traditional loops allow for early interruption or return, but Stream's forEach method does not directly support this method. This article will explain the reasons and explore alternative methods for implementing premature termination in Stream processing systems. Further reading: Java Stream API improvements Understand Stream forEach The forEach method is a terminal operation that performs one operation on each element in the Stream. Its design intention is
