How does jsoup save images from crawled websites locally?
This time I will show you how jsoup saves the pictures of the crawled website to the local. What are the things to note? The following is a practical case. , let’s take a look. Because
project requirementsrequire vehicle brand information and car series information, I spent a day yesterday studying jsoup crawling website information. The project is written using maven spring springmvc mybatis. jsoup development guide address
This is the address of the website that needs to be crawled
https://car.autohome.com.cn/zhaoche/pinpai/ 1. First add dependencies
in pom.xml Because I need to save the image locally, I added the commons-net package
<!-- https://mvnrepository.com/artifact/org.jsoup/jsoup --> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.3</version> </dependency> <!-- https://mvnrepository.com/artifact/commons-net/commons-net --> <dependency> <groupId>commons-net</groupId> <artifactId>commons-net</artifactId> <version>3.3</version> </dependency>
2. Implementation of crawler code
@Controller @RequestMapping("/car/") public class CarController { //图片保存路径 private static final String saveImgPath="C://imgs"; /** * @Title: insert 品牌名称 和图片爬取和添加 * @Description: * @param @throws IOException * @return void * @throws * @date 2018年1月29日 下午4:42:57 */ @RequestMapping("add") public void insert() throws IOException { //定义想要爬取数据的地址 String url = "https://car.autohome.com.cn/zhaoche/pinpai/"; //获取网页文本 Document doc = Jsoup.connect(url).get(); //根据类名获取文本内容 Elements elementsByClass = doc.getElementsByClass("uibox-con"); //遍历类的集合 for (Element element : elementsByClass) { //获取类的子标签数量 int childNodeSize_1 = element.childNodeSize(); //循环获取子标签内的内容 for (int i = 0; i < childNodeSize_1; i++) { //获取车标图片地址 String tupian = element.child(i).child(0).child(0).child(0).child(0).attr("src"); //获取品牌名称 String pinpai = element.child(i).child(0).child(1).text(); //输出获取内容看是否正确 System.out.println("车标图片地址-----------" + tupian); System.out.println("品牌-----------" + pinpai); System.out.println(); //把车标图片保存到本地 String tupian_1 = "http:"+tupian; //连接url URL url1 = new URL(tupian_1); URLConnection uri=url1.openConnection(); //获取数据流 InputStream is=uri.getInputStream(); //获取后缀名 String imageName = tupian.substring(tupian.lastIndexOf("/") + 1,tupian.length()); //写入数据流 OutputStream os = new FileOutputStream(new File(saveImgPath, imageName)); byte[] buf = new byte[1024]; int p=0; while((p=is.read(buf))!=-1){ os.write(buf, 0, p); } /** * 因为每个品牌下有多个合资工厂 * 比如一汽大众和上海大众还有进口大众 * 所有需要循环获取合资工厂名称和旗下 * 车系 */ //获取车系数量 int childNodeSize_2 = element.child(i).child(1).child(0).childNodeSize(); /** * 获取标签下子标签数量 * 如果等于1则没有其他合资工厂 */ int childNodeSize_3 = element.child(i).child(1).childNodeSize(); if(childNodeSize_3==1){ //循环获取车系信息 for (int j = 0; j < childNodeSize_2; j++) { String chexi = element.child(i).child(1).child(0).child(j).child(0).child(0).text(); System.out.println("车系-----------" + chexi); } }else{ /** * 如果childNodeSize_3大于1 * 则有多个合资工厂 */ //分别获取各个合资工厂旗下车系 for (int j = 0; j < childNodeSize_3; j++) { int childNodeSize_4 = element.child(i).child(1).child(j).childNodeSize(); /** * 如果j是单数则是合资工厂名称 * 否则是车系信息 */ int k = j%2; if(k==0){ //获取合资工厂信息 String hezipinpai = element.child(i).child(1).child(j).child(0).text(); System.out.println("合资企业名称-----------" + hezipinpai); }else{ //int childNodeSize_5 = element.child(i).child(1).child(0).childNodeSize(); //循环获取合资工厂车系信息 for(int l = 0; l < childNodeSize_4; l++){ String chexi = element.child(i).child(1).child(j).child(l).child(0).child(0).text(); System.out.println("车系-----------" + chexi); } } } } System.out.println("************************"); System.out.println("************************"); } } } }
I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the PHP Chinese website!
Recommended reading:
JS prompt text box email address completiongetBoundingClientRect usage and compatibility handlingThe above is the detailed content of How does jsoup save images from crawled websites locally?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Xiaohongshu has rich content that everyone can view freely here, so that you can use this software to relieve boredom every day and help yourself. In the process of using this software, you will sometimes see various beautiful things. Many people want to save pictures, but the saved pictures have watermarks, which is very influential. Everyone wants to know how to save pictures without watermarks here. The editor provides you with a method for those in need. Everyone can understand and use it immediately! 1. Click the "..." in the upper right corner of the picture to copy the link 2. Open the WeChat applet 3. Search the sweet potato library in the WeChat applet 4. Enter the sweet potato library and confirm to get the link 5. Get the picture and save it to the mobile phone album

How to use WebSocket and JavaScript to implement an online speech recognition system Introduction: With the continuous development of technology, speech recognition technology has become an important part of the field of artificial intelligence. The online speech recognition system based on WebSocket and JavaScript has the characteristics of low latency, real-time and cross-platform, and has become a widely used solution. This article will introduce how to use WebSocket and JavaScript to implement an online speech recognition system.

1. Open the Douyin app, find the video you want to download and save, and click the [Share] button in the lower right corner. 2. In the pop-up window that appears, slide the function buttons in the second row to the right, find and click [Save Local]. 3. A new pop-up window will appear at this time, and the user can see the download progress of the video and wait for the download to complete. 4. After the download is completed, there will be a prompt of [Saved, please go to the album to view], so that the video just downloaded will be successfully saved to the user's mobile phone album.

WebSocket and JavaScript: Key technologies for realizing real-time monitoring systems Introduction: With the rapid development of Internet technology, real-time monitoring systems have been widely used in various fields. One of the key technologies to achieve real-time monitoring is the combination of WebSocket and JavaScript. This article will introduce the application of WebSocket and JavaScript in real-time monitoring systems, give code examples, and explain their implementation principles in detail. 1. WebSocket technology

Introduction to how to use JavaScript and WebSocket to implement a real-time online ordering system: With the popularity of the Internet and the advancement of technology, more and more restaurants have begun to provide online ordering services. In order to implement a real-time online ordering system, we can use JavaScript and WebSocket technology. WebSocket is a full-duplex communication protocol based on the TCP protocol, which can realize real-time two-way communication between the client and the server. In the real-time online ordering system, when the user selects dishes and places an order

Video account is a popular short video application that allows users to shoot, edit and share their own videos. However, sometimes we may want to save these amazing videos to our photo album so that we can always look back at them when needed. So, next I will share some methods to teach you how to save the video of the video account to the album. Videos can be saved through the built-in function of the Video Number application. Open the app and find the video you want to save. Click the options icon in the lower right corner of the video, a menu will pop up, select "Save to Album". This will save the video to your phone's photo album. Method two is to save the video by taking a screenshot. This method is relatively straightforward, but the saved image will contain elements such as video control bars, which is not pure enough. you

How to use WebSocket and JavaScript to implement an online reservation system. In today's digital era, more and more businesses and services need to provide online reservation functions. It is crucial to implement an efficient and real-time online reservation system. This article will introduce how to use WebSocket and JavaScript to implement an online reservation system, and provide specific code examples. 1. What is WebSocket? WebSocket is a full-duplex method on a single TCP connection.

After using HP printers to scan files, many users don't know where the scanned files are saved. If they want to find out where they are, they can search as scheduled in My Computer. Where are the files scanned by HP printers saved: 1. First open My Computer. 2. Then enter the date to search. 3. Then you can find the scanned files. 4. After the printer driver is installed, there will be a printer multifunction machine auxiliary software, open it. 5. Finally, click the scan folder icon to find the file.
