How to crawl Baidu Cloud connection method in python-Python Tutorial-php.cn

How to crawl Baidu Cloud connection method in python

巴扎黑

Release： 2017-08-07 17:34:30

Original

2471 people have browsed it

This article mainly introduces the example code of python urllib crawling Baidu cloud connection, which has certain reference value. Interested friends can refer to it

Looking through the programs I wrote before, I found I wrote something that crawled Baidu Cloud resources from Panduoduo. I just wrote it myself because I wanted to watch Transformers. It was my first time to come into contact with python and it took me about 2 days to come up with this program. I learned the python language and can see it. The code written at that time was really low. Although it’s not that good now, haha, I’m still learning, so I won’t explain too much. The code is shown in the picture above, because I forgot what the variable declaration was (manual arrogance), and I didn’t even know how to write a file at the time, haha Hahahahahaha, I didn’t know that class can be initialized through init. Alas, I learned so many things when I learned python. Thank you python

from bs4 import BeautifulSoup
import urllib
import requests
import re

adr =[]

&#39;&#39;&#39;&#39;对搜素资源名字进行url编码&#39;&#39;&#39;
search_text =raw_input(&#39;请输入搜索资源名：&#39;)
search_text = search_text.decode(&#39;gbk&#39;)
search_text = search_text.encode(&#39;utf-8&#39;)
search_text = urllib.quote(search_text)


&#39;&#39;&#39;&#39;获取文件地址&#39;&#39;&#39;
home = urllib.urlopen(&#39;http://www.panduoduo.net/s/name/&#39;+search_text)


&#39;&#39;&#39;获取百度云地址&#39;&#39;&#39;
def getbaidu(adr):
  for i in adr:
    url = urllib.urlopen(&#39;http://www.panduoduo.net&#39;+i)
    bs = BeautifulSoup(url)
    bs1 = bs.select(&#39;.dbutton2&#39;)
    href = re.compile(&#39;http\%(\%|\d|\w|\/\/|\/|\.)*&#39;)
    b = href.search(str(bs1))
    name = str(bs.select(&#39;.center&#39;)).decode(&#39;utf-8&#39;)
    text1 = re.compile(&#39;\<h1\sclass\=\"center"\>[\d|\w|\D|\W]*\</h1\>&#39;)
    text2 = text1.search(name)
    rag1 = re.compile(&#39;\>[\d|\w|\D|\W]*\<&#39;)
    if text2:
      text3 = rag1.search(text2.group())
      if text3:
        print text3.group()
    if b:
      text = urllib.unquote(str(b.group())).decode(&#39;utf-8&#39;)
      print text

&#39;&#39;&#39;初始化&#39;&#39;&#39;
def init(adr):
  soup = BeautifulSoup(home)
  soup = soup.select(&#39;.row&#39;)
  pattern = re.compile(&#39;\/r\/\d+&#39;)
  for i in soup:
    i = str(i)
    adress = pattern.search(i)
    adress = adress.group()
    adr.append(adress)


print &#39;running---------&#39;    
init(adr)
getbaidu(adr)

Copy after login

The above is the detailed content of How to crawl Baidu Cloud connection method in python. For more information, please follow other related articles on the PHP Chinese website!