java - 如何使用jsoup从一个需要登陆的网站下载图片

Question

有图片的地址，如：http://i2.pixiv.net/img-original/img/201...
还有登陆后获取的cooikes：Connection cookies(Map<String, String> cookies);
登陆和获取cooikes以及解析页面是使用的jsoup。
但是需要登陆后才能获取，使用以下代码：

ringa_lee · Answer

The problem has been solved. Use Firebug to capture the request packet sent when browsing pictures in the browser and then use the URLConnection construct to send the request packet with cookies according to its format. The download speed is too slow for images larger than 1M.

/**
     * 下载图片从URL
     *
     * @param img 图片对象
     * @param imgFile 代写入文件对象
     * @throws MalformedURLException 获取URL异常
     * @throws IOException URLConnection获取异常
     */
    public void downloadImg(Img img, File imgFile) throws MalformedURLException, IOException {
        URL url = new URL(img.getUrl());
        URLConnection uc = url.openConnection();
        uc.setConnectTimeout(Setting._Download_Img_TimeOut); // 设置下载图片超时时间
        uc.setRequestProperty("accept", "image/png,image/*;q=0.8,*/*;q=0.5");
        uc.setRequestProperty("accept-encoding", "gzip, deflate");
        uc.setRequestProperty("accept-language", "zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3");
        uc.setRequestProperty("connection", "keep-alive");
        uc.setRequestProperty("cookie", PixivLogin.userCookies.toString());//这里是cookie部分
        uc.setRequestProperty("dnt", "1");
        uc.setRequestProperty("host", "i2.pixiv.net");
        uc.setRequestProperty("user-agent", Setting._DownLoadImgClient_UserAgent);

        uc.setDoInput(true);
        uc.setDoOutput(true);
        System.out.println("图片获取成功");
        System.out.println("开始写入硬盘");
        InputStream is = uc.getInputStream();
        FileOutputStream out = new FileOutputStream(imgFile);
        //BufferedOutputStream bout = new BufferedOutputStream(out);
        int i = 0;

        while ((i = is.read()) != -1) {
            out.write(i);
        }
        is.close();
        System.out.println(img.getName() + "写入完毕 " + imgFile.length());
    }

ringa_lee · Answer

No matter what you use to parse HTML, the only thing that determines your login is the cookie in the http request, so you can make a login request first, get the cookie from http res, and then set the cookie to the next http request, and you're done. Without the browser keeping cookies, the resource data that requires login can be downloaded

PHP中文网 · Answer

After extracting img src, if you use URLConnection to download pictures, do you have permission? Check whether there is something like session_id in the website. In short, find out the user login ID and put some ID in img src

怪我咯 · Answer

Use apache httpClient to simulate login