Home Backend Development Python Tutorial Example of implementing Youku video batch download function using Python

Example of implementing Youku video batch download function using Python

Mar 16, 2017 am 09:21 AM

前段时间由于收集视频数据的需要,自己捣鼓了一个YouKu视频批量下载的程序。东西虽然简单,但还挺实用的,拿出来分享给大家。

  版本:Python2.7+BeautifulSoup3.2.1

import urllib,urllib2,sys,os
from BeautifulSoup import BeautifulSoup
import itertools,re
url_i =1
pic_num = 1
#自己定义的引号格式转换函数
def _en_to_cn(str):
  obj = itertools.cycle(['“','”'])
  _obj = lambda x: obj.next()
  return re.sub(r"['\"]",_obj,str)
if name == 'main':
  #下载连续3个网页的视频
  while url_i <= 3:
    webContent = urllib2.urlopen("http://news.youku.com/focus/index/_page26716_" + str(url_i) + ".html")
    data = webContent.read()
    #利用BeautifulSoup读取视频列表网页数据
    soup = BeautifulSoup(data)
    print "-------------------------Page " + str(url_i) + "-------------------------"
    #获得相应页面的视频thumbnail和title的list
    tag_list_thumb = soup.findAll(&#39;li&#39;,&#39;v_thumb&#39;)
    tag_list = soup.findAll(&#39;li&#39;, "v_title")
    for item in tag_list:
      #通过每个thumbnail中的herf导向视频播放页面
      web_video_play = urllib2.urlopen(item.a[&#39;href&#39;])
      data_vp = web_video_play.read()
      #利用BeautifulSoup读取视频播放网页数据
      soup_vp = BeautifulSoup(data_vp)
      #找到“下载”对应的链接
      tag_vp_list = soup_vp.findAll(&#39;a&#39;, id = &#39;fn_download&#39;)
      for item_vp in tag_vp_list:
        #将下载链接保存到url_dw中
        url_dw = &#39;"&#39; + item_vp[&#39;_href&#39;] + &#39;"&#39;
        print item.a[&#39;title&#39;] + ": " + url_dw
        #调用命令行运行iku下载视频,需将iku加入环境变量
        os.system("iku " + url_dw)
    #保存每个视频的thumbnail
    for item_thumb in tag_list_thumb:
      urllib.urlretrieve(item_thumb.img[&#39;src&#39;], "E:\\下载视频\\thumbnails\\" + str(pic_num) + "." +
                _en_to_cn(item_thumb.img[&#39;title&#39;]) + ".jpg")
      pic_num += 1
    print "--------------------------------------------------------------"
    print "--------Page " + str(url_i) + "&#39;s video thumbnails have been saved!"
    url_i += 1
Copy after login

  程序思想很简单,就是通过解析网页数据找到相应的视频播放网页链接,然后根据播放页面找到下载的链接,如下图所示:

Example of implementing Youku video batch download function using Python

  由于从网页数据中获得的下载链接是必须通过youku自己的iku才能下载的。这一点费了我一番周折,侥幸发现iku这个软件的命令行非常简单(直接iku download_link即可),所以最简单的办法就是利用Python中的命令行接口os.system来调用iku来下载视频。另外注意程序运行之前需要先启动iku,否则下载完一个视频就要再启动一次。

PS:下载视频的时候就会发现,国内这些视频网页做的真的不够精细,含有太多的重复链接和坏死链接,小小鄙视一下。


The above is the detailed content of Example of implementing Youku video batch download function using Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to solve the permissions problem encountered when viewing Python version in Linux terminal? How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? Apr 01, 2025 pm 11:15 PM

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How to teach computer novice programming basics in project and problem-driven methods within 10 hours? How to teach computer novice programming basics in project and problem-driven methods within 10 hours? Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

What are regular expressions? What are regular expressions? Mar 20, 2025 pm 06:25 PM

Regular expressions are powerful tools for pattern matching and text manipulation in programming, enhancing efficiency in text processing across various applications.

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

How does Uvicorn continuously listen for HTTP requests without serving_forever()? How does Uvicorn continuously listen for HTTP requests without serving_forever()? Apr 01, 2025 pm 10:51 PM

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

What are some popular Python libraries and their uses? What are some popular Python libraries and their uses? Mar 21, 2025 pm 06:46 PM

The article discusses popular Python libraries like NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Django, Flask, and Requests, detailing their uses in scientific computing, data analysis, visualization, machine learning, web development, and H

How to dynamically create an object through a string and call its methods in Python? How to dynamically create an object through a string and call its methods in Python? Apr 01, 2025 pm 11:18 PM

In Python, how to dynamically create an object through a string and call its methods? This is a common programming requirement, especially if it needs to be configured or run...

See all articles