


Java multi-threading captures ringtone data from the official website of Ringtone Duoduo
一直想练习下java多线程抓取数据。
有天被我发现,铃声多多的官网(http://www.shoujiduoduo.com/main/)有大量的数据。
通过观察他们前端获取铃声数据的ajax
http://www.shoujiduoduo.com/ringweb/ringweb.php?type=getlist&listid={类别ID}&page={分页页码}
很容易就能发现通过改变 listId和page就能从服务器获取铃声的json数据, 通过解析json数据,
可以看到都带有{"hasmore":1,"curpage":1}这样子的指示,通过判断hasmore的值,决定是否进行下一页的抓取。
但是通过上面这个链接返回的json中不带有铃声的下载地址
很快就可以发现,点击页面的“下载”会看到
通过下面的请求,就可以获取铃声的下载地址了
http://www.shoujiduoduo.com/ringweb/ringweb.php?type=geturl&act=down&rid={铃声ID}
所以,他们的数据是很容易被偷的。于是我就开始...
源码已经发在github上。如果感兴趣的童鞋可以查看
github:https://github.com/yongbo000/DuoduoAudioRobot
上代码:
<pre class="brush:java;">package me.yongbo.DuoduoRingRobot; import java.io.BufferedReader; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.net.URL; import java.net.URLConnection; import java.util.Iterator; import java.util.regex.Matcher; import java.util.regex.Pattern; import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonElement; import com.google.gson.JsonParser; /* * @author yongbo_ * @created 2013/4/16 * * */ public class DuoduoRingRobotClient implements Runnable { public static String GET_RINGINFO_URL = "http://www.shoujiduoduo.com/ringweb/ringweb.php?type=getlist&listid=%1$d&page=%2$d"; public static String GET_DOWN_URL = "http://www.shoujiduoduo.com/ringweb/ringweb.php?type=geturl&act=down&rid=%1$d"; public static String ERROR_MSG = "listId为 %1$d 的Robot发生错误,已自动停止。当前page为 %2$d";public static String STATUS_MSG = "开始抓取数据,当前listId: %1$d,当前page: %2$d"; public static String FILE_DIR = "E:/RingData/";public static String FILE_NAME = "listId=%1$d.txt";private boolean errorFlag = false;private int listId;private int page; private int endPage = -1;private int hasMore = 1; private DbHelper dbHelper; /** * 构造函数 * @param listId 菜单ID * @param page 开始页码 * @param endPage 结束页码 * */ public DuoduoRingRobotClient(int listId, int beginPage, int endPage) {this.listId = listId;this.page = beginPage;this.endPage = endPage;this.dbHelper = new DbHelper();} /** * 构造函数 * @param listId 菜单ID * @param page 开始页码 * */ public DuoduoRingRobotClient(int listId, int page) {this(listId, page, -1);} /** * 获取铃声 * */public void getRings() {String url = String.format(GET_RINGINFO_URL, listId, page);String responseStr = httpGet(url);hasMore = getHasmore(responseStr); page = getNextPage(responseStr); ringParse(responseStr.replaceAll("\\{\"hasmore\":[0-9]*,\"curpage\":[0-9]*\\},", "").replaceAll(",]", "]"));}/** * 发起http请求 * @param webUrl 请求连接地址 * */public String httpGet(String webUrl){URL url;URLConnection conn;StringBuilder sb = new StringBuilder();String resultStr = "";try {url = new URL(webUrl);conn = url.openConnection();conn.connect();InputStream is = conn.getInputStream();InputStreamReader isr = new InputStreamReader(is);BufferedReader bufReader = new BufferedReader(isr);String lineText;while ((lineText = bufReader.readLine()) != null) {sb.append(lineText);}resultStr = sb.toString();} catch (Exception e) {errorFlag = true;//将错误写入txtwriteToFile(String.format(ERROR_MSG, listId, page));}return resultStr;}/** * 将json字符串转化成Ring对象,并存入txt中 * @param json Json字符串 * */public void ringParse(String json) {Ring ring = null;JsonElement element = new JsonParser().parse(json);JsonArray array = element.getAsJsonArray();// 遍历数组Iterator<JsonElement> it = array.iterator(); Gson gson = new Gson();while (it.hasNext() && !errorFlag) {JsonElement e = it.next();// JsonElement转换为JavaBean对象ring = gson.fromJson(e, Ring.class);ring.setDownUrl(getRingDownUrl(ring.getId()));if(isAvailableRing(ring)) {System.out.println(ring.toString()); //可选择写入数据库还是写入文本//writeToFile(ring.toString());writeToDatabase(ring);}}} /** * 写入txt * @param data 字符串 * */public void writeToFile(String data) {String path = FILE_DIR + String.format(FILE_NAME, listId);File dir = new File(FILE_DIR);File file = new File(path);FileWriter fw = null;if(!dir.exists()){dir.mkdirs(); }try {if(!file.exists()){file.createNewFile();}fw = new FileWriter(file, true); fw.write(data);fw.write("\r\n");fw.flush();} catch (IOException e) { // TODO Auto-generated catch blocke.printStackTrace(); }finally {try {if(fw != null){fw.close();}} catch (IOException e) { // TODO Auto-generated catch blocke.printStackTrace();}}}/** * 写入数据库 * @param ring 一个Ring的实例 * */ public void writeToDatabase(Ring ring) {dbHelper.execute("addRing", ring);} @Overridepublic void run() {while(hasMore == 1 && !errorFlag){if(endPage != -1){if(page > endPage) { break; }}System.out.println(String.format(STATUS_MSG, listId, page)); getRings();System.out.println(String.format("该页数据写入完成"));}System.out.println("ending...");} private int getHasmore(String resultStr){Pattern p = Pattern.compile("\"hasmore\":([0-9]*),\"curpage\":([0-9]*)"); Matcher match = p.matcher(resultStr); if (match.find()) { return Integer.parseInt(match.group(1)); } return 0; } private int getNextPage(String resultStr){Pattern p = Pattern.compile("\"hasmore\":([0-9]*),\"curpage\":([0-9]*)");Matcher match = p.matcher(resultStr);if (match.find()) {return Integer.parseInt(match.group(2));}return 0;} /** * 判断当前Ring是否满足条件。当Ring的name大于50个字符或是duration为小数则不符合条件,将被剔除。 * @param ring 当前Ring对象实例 * */private boolean isAvailableRing(Ring ring){Pattern p = Pattern.compile("^[1-9][0-9]*$"); Matcher match = p.matcher(ring.getDuration()); if(!match.find()){return false;}if(ring.getName().length() > 50 || ring.getArtist().length() > 50 || ring.getDownUrl().length() == 0){return false;}return true;} /** * 获取铃声的下载地址 * @param rid 铃声的id * */ public String getRingDownUrl(String rid){String url = String.format(GET_DOWN_URL, rid); String responseStr = httpGet(url);return responseStr;}}
更多Java multi-threading captures ringtone data from the official website of Ringtone Duoduo相关文章请关注PHP中文网!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



In Java development, file reading is a very common and important operation. As your business grows, so do the size and number of files. In order to increase the speed of file reading, we can use multi-threading to read files in parallel. This article will introduce how to optimize file reading multi-thread acceleration performance in Java development. First, before reading the file, we need to determine the size and quantity of the file. Depending on the size and number of files, we can set the number of threads reasonably. Excessive number of threads may result in wasted resources,

Detailed explanation of the role and application scenarios of the volatile keyword in Java 1. The role of the volatile keyword In Java, the volatile keyword is used to identify a variable that is visible between multiple threads, that is, to ensure visibility. Specifically, when a variable is declared volatile, any modifications to the variable are immediately known to other threads. 2. Application scenarios of the volatile keyword The status flag volatile keyword is suitable for some status flag scenarios, such as a

Key points of exception handling in a multi-threaded environment: Catching exceptions: Each thread uses a try-catch block to catch exceptions. Handle exceptions: print error information or perform error handling logic in the catch block. Terminate the thread: When recovery is impossible, call Thread.stop() to terminate the thread. UncaughtExceptionHandler: To handle uncaught exceptions, you need to implement this interface and assign it to the thread. Practical case: exception handling in the thread pool, using UncaughtExceptionHandler to handle uncaught exceptions.

Explore the working principles and characteristics of Java multithreading Introduction: In modern computer systems, multithreading has become a common method of concurrent processing. As a powerful programming language, Java provides a rich multi-threading mechanism, allowing programmers to better utilize the computer's multi-core processor and improve program running efficiency. This article will explore the working principles and characteristics of Java multithreading and illustrate it with specific code examples. 1. The basic concept of multi-threading Multi-threading refers to executing multiple threads at the same time in a program, and each thread processes different

The Java Multithreading Performance Optimization Guide provides five key optimization points: Reduce thread creation and destruction overhead Avoid inappropriate lock contention Use non-blocking data structures Leverage Happens-Before relationships Consider lock-free parallel algorithms

Multi-threaded debugging technology answers: 1. Challenges in multi-threaded code debugging: The interaction between threads leads to complex and difficult-to-track behavior. 2. Java multi-thread debugging technology: line-by-line debugging thread dump (jstack) monitor entry and exit events thread local variables 3. Practical case: use thread dump to find deadlock, use monitor events to determine the cause of deadlock. 4. Conclusion: The multi-thread debugging technology provided by Java can effectively solve problems related to thread safety, deadlock and contention.

Java is a programming language widely used in modern software development, and its multi-threaded programming capabilities are also one of its greatest advantages. However, due to the concurrent access problems caused by multi-threading, multi-thread safety issues often occur in Java. Among them, java.lang.ThreadDeath is a typical multi-thread security issue. This article will introduce the causes and solutions of java.lang.ThreadDeath. 1. Reasons for java.lang.ThreadDeath

The Java concurrency lock mechanism ensures that shared resources are accessed by only one thread in a multi-threaded environment. Its types include pessimistic locking (acquire the lock and then access) and optimistic locking (check for conflicts after accessing). Java provides built-in concurrency lock classes such as ReentrantLock (mutex lock), Semaphore (semaphore) and ReadWriteLock (read-write lock). Using these locks can ensure thread-safe access to shared resources, such as ensuring that when multiple threads access the shared variable counter at the same time, only one thread updates its value.
