java - 如何使用jsoup从一个需要登陆的网站下载图片
高洛峰
高洛峰 2017-04-17 17:41:15
0
4
422

有图片的地址,如:http://i2.pixiv.net/img-original/img/201...
还有登陆后获取的cooikes:Connection cookies(Map<String, String> cookies);
登陆和获取cooikes以及解析页面是使用的jsoup。
但是需要登陆后才能获取,使用以下代码:

private void downloadImg(String imgURL) throws MalformedURLException, IOException {
        URL url = new URL(imgURL);
        URLConnection uc = url.openConnection();
        InputStream is = uc.getInputStream();
        File file = new File("D:\\pixiv");
        FileOutputStream out = new FileOutputStream(file);
        int i = 0;

        while ((i = is.read()) != -1) {
            out.write(i);
        }
        is.close();
    }

只能下载不需要登陆的页面的图片。
要如何使用jsoup带cooikes来从网站下载图片

高洛峰
高洛峰

拥有18年软件开发和IT教学经验。曾任多家上市公司技术总监、架构师、项目经理、高级软件工程师等职务。 网络人气名人讲师,...

reply all(4)
左手右手慢动作

The problem has been solved. Use Firebug to capture the request packet sent when browsing pictures in the browser and then use the URLConnection construct to send the request packet with cookies according to its format. The download speed is too slow for images larger than 1M.

/**
     * 下载图片从URL
     *
     * @param img 图片对象
     * @param imgFile 代写入文件对象
     * @throws MalformedURLException 获取URL异常
     * @throws IOException URLConnection获取异常
     */
    public void downloadImg(Img img, File imgFile) throws MalformedURLException, IOException {
        URL url = new URL(img.getUrl());
        URLConnection uc = url.openConnection();
        uc.setConnectTimeout(Setting._Download_Img_TimeOut); // 设置下载图片超时时间
        uc.setRequestProperty("accept", "image/png,image/*;q=0.8,*/*;q=0.5");
        uc.setRequestProperty("accept-encoding", "gzip, deflate");
        uc.setRequestProperty("accept-language", "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3");
        uc.setRequestProperty("connection", "keep-alive");
        uc.setRequestProperty("cookie", PixivLogin.userCookies.toString());//这里是cookie部分
        uc.setRequestProperty("dnt", "1");
        uc.setRequestProperty("host", "i2.pixiv.net");
        uc.setRequestProperty("user-agent", Setting._DownLoadImgClient_UserAgent);

        uc.setDoInput(true);
        uc.setDoOutput(true);
        System.out.println("图片获取成功");
        System.out.println("开始写入硬盘");
        InputStream is = uc.getInputStream();
        FileOutputStream out = new FileOutputStream(imgFile);
        //BufferedOutputStream bout = new BufferedOutputStream(out);
        int i = 0;

        while ((i = is.read()) != -1) {
            out.write(i);
        }
        is.close();
        System.out.println(img.getName() + "写入完毕 " + imgFile.length());
    }
左手右手慢动作

No matter what you use to parse HTML, the only thing that determines your login is the cookie in the http request, so you can make a login request first, get the cookie from http res, and then set the cookie to the next http request, and you're done. Without the browser keeping cookies, the resource data that requires login can be downloaded

Ty80

After extracting img src, if you use URLConnection to download pictures, do you have permission? Check whether there is something like session_id in the website. In short, find out the user login ID and put some ID in img src

刘奇

Use apache httpClient to simulate login

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template