Home php教程 PHP源码 python抓取安居客小区数据的程序代码

python抓取安居客小区数据的程序代码

Jun 08, 2016 pm 05:20 PM
get nbsp quot

抓取数据不管用什么编程语言几乎都是可以实现了,今天我们需要采集安居客的小区数据,下面我们来看一个python抓取安居客小区数据的程序代码了,希望下文能够对大家有帮助。

<script>ec(2);</script>

某功能需要一套城市所有小区的位置信息数据,一开始是使用的百度地图api来进行关键词搜索,勉强能用,但数据量非常少,还是有大量的社区/小区搜不到。
周末在家上网时发现安居客上直接就有每个城市的小区大全,欣喜若狂,于是就立即写了个爬虫试试。
以下贴代码,python2.7,lxml+request库。

#coding=utf-8
#author : zx
#date   : 2015/07/27
import requests
import MySQLdb
import time
import string
import random
from lxml import etree
#ua头信息 get时可以随机使用
headers = [
    { "User-Agent":"Mozilla/5.0 (Linux; U; Android 4.1; en-us; GT-N7100 Build/JRO03C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"},
    { "User-Agent":"Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)"},
    { "User-Agent":"Mozilla/5.0 (BB10; Touch) AppleWebKit/537.10+ (KHTML, like Gecko) Version/10.0.9.2372 Mobile Safari/537.10+"},
    { "User-Agent":"Mozilla/5.0 (Linux; Android 4.4.2; GT-I9505 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Version/1.5 Chrome/28.0.1500.94 Mobile Safari/537.36"}
]
#城市入口页面
#我只抓的青岛本地
#其它城市或全国城市可通过这个页面抓取城市列表http://m.anjuke.com/cityList
url = &#39;http://m.anjuke.com/qd/xiaoqu/&#39;
req = requests.get(url)
cookie = req.cookies.get_dict()
#链接数据库
conn = MySQLdb.connect(&#39;localhost&#39;, &#39;*****&#39;, &#39;******&#39;, &#39;***&#39;, charset=&#39;utf8&#39;)
cursor = conn.cursor()
sql = "insert into xiaoqu (name, lat, lng, address, district) values (%s, %s, %s, %s, %s)"
sql_v = [] 
page = etree.HTML(req.text)
districtHTML = page.xpath(u"//div[@class=&#39;listcont cont_hei&#39;]")[0]
#采集目标城市的各行政区域url
#当然如果不想区分行政区可以直接抓“全部” 即上面url中的所有小区及分页
districtUrl = {}
i = 0
for a in districtHTML:
    if i==0:
        i = 1
        continue
    districtUrl[a.text] = a.get(&#39;href&#39;)
#开始采集
total_all = 0
for k,u in districtUrl.items():
    p = 1 #分页
    while True:
        header_i = random.randint(0, len(headers)-1)
        url_p = u.rstrip(&#39;/&#39;) + &#39;-p&#39; + str(p)
        r = requests.get(url_p, cookies=cookie, headers=headers[header_i])
        page = etree.HTML(r.text) #这里转换大小写要按情况...
        communitysUrlDiv = page.xpath(u"//div[@class=&#39;items&#39;]")[0]
        total = len(communitysUrlDiv)
        i = 0
        for a in communitysUrlDiv:
            i+=1
            r = requests.get(a.get(&#39;href&#39;), cookies=cookie, headers=headers[header_i])
            #抓取时发现有少量404页会直接导致程序报错退出- -!
            #唉 说明代码写的还不够健壮啊
            #加了if判断和try, 错误时可以跳过或做一些简单处理和调试...
            if r.status_code == 404:
                continue
            page = etree.HTML(r.text)
            try:
                name = page.xpath(u"//h1[@class=&#39;f1&#39;]")[0].text
            except:
                print a.get(&#39;href&#39;)
                print r.text
                raw_input()
            #有少量小区未设置经纬度信息
            #只能得到它的地址了
            try:
                latlng = page.xpath(u"//a[@class=&#39;comm_map&#39;]")[0]
                lat = latlng.get(&#39;lat&#39;)
                lng = latlng.get(&#39;lng&#39;)
                address = latlng.get(&#39;address&#39;)
            except:
                lat = &#39;&#39;
                lng = &#39;&#39;
                address = page.xpath(u"//span[@class=&#39;rightArea&#39;]/em")[0].text
            sql_v.append((name, lat, lng, address, k))
            print "\r\r\r",
            print u"正在下载 %s 的数据,第 %d 页,共 %d 条,当前:".encode(&#39;gbk&#39;) %(k.encode(&#39;gbk&#39;),p, total) + string.rjust(str(i),3).encode(&#39;gbk&#39;),
            time.sleep(0.5) #每次抓取停顿
        #执行插入数据库
        cursor.executemany(sql, sql_v)
        sql_v = []
        time.sleep(5)  #每页完成后停顿
        total_all += total
        print &#39;&#39;
        print u"成功入库 %d 条数据,总数 %d".encode(&#39;gbk&#39;) % (total, total_all)
        if total < 500:
            break
        else:
            p += 1
#及时关闭数据库 做个好孩子 任务完成~
cursor.close()
conn.close()
print u&#39;所有数据采集完成! 共 %d 条数据&#39;.encode(&#39;gbk&#39;) % (total_all)
raw_input()
Copy after login


注释我觉得已经写的很详细了,在cmd中显示,字符串当然要转一下码。
以下是运行状态和得到的数据截图。

 

python抓取安居客小区数据

安居客小区数据库

 

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Solution: Your organization requires you to change your PIN Solution: Your organization requires you to change your PIN Oct 04, 2023 pm 05:45 PM

The message "Your organization has asked you to change your PIN" will appear on the login screen. This happens when the PIN expiration limit is reached on a computer using organization-based account settings, where they have control over personal devices. However, if you set up Windows using a personal account, the error message should ideally not appear. Although this is not always the case. Most users who encounter errors report using their personal accounts. Why does my organization ask me to change my PIN on Windows 11? It's possible that your account is associated with an organization, and your primary approach should be to verify this. Contacting your domain administrator can help! Additionally, misconfigured local policy settings or incorrect registry keys can cause errors. Right now

How to adjust window border settings on Windows 11: Change color and size How to adjust window border settings on Windows 11: Change color and size Sep 22, 2023 am 11:37 AM

Windows 11 brings fresh and elegant design to the forefront; the modern interface allows you to personalize and change the finest details, such as window borders. In this guide, we'll discuss step-by-step instructions to help you create an environment that reflects your style in the Windows operating system. How to change window border settings? Press + to open the Settings app. WindowsI go to Personalization and click Color Settings. Color Change Window Borders Settings Window 11" Width="643" Height="500" > Find the Show accent color on title bar and window borders option, and toggle the switch next to it. To display accent colors on the Start menu and taskbar To display the theme color on the Start menu and taskbar, turn on Show theme on the Start menu and taskbar

How to change title bar color on Windows 11? How to change title bar color on Windows 11? Sep 14, 2023 pm 03:33 PM

By default, the title bar color on Windows 11 depends on the dark/light theme you choose. However, you can change it to any color you want. In this guide, we'll discuss step-by-step instructions for three ways to change it and personalize your desktop experience to make it visually appealing. Is it possible to change the title bar color of active and inactive windows? Yes, you can change the title bar color of active windows using the Settings app, or you can change the title bar color of inactive windows using Registry Editor. To learn these steps, go to the next section. How to change title bar color in Windows 11? 1. Using the Settings app press + to open the settings window. WindowsI go to "Personalization" and then

OOBELANGUAGE Error Problems in Windows 11/10 Repair OOBELANGUAGE Error Problems in Windows 11/10 Repair Jul 16, 2023 pm 03:29 PM

Do you see "A problem occurred" along with the "OOBELANGUAGE" statement on the Windows Installer page? The installation of Windows sometimes stops due to such errors. OOBE means out-of-the-box experience. As the error message indicates, this is an issue related to OOBE language selection. There is nothing to worry about, you can solve this problem with nifty registry editing from the OOBE screen itself. Quick Fix – 1. Click the “Retry” button at the bottom of the OOBE app. This will continue the process without further hiccups. 2. Use the power button to force shut down the system. After the system restarts, OOBE should continue. 3. Disconnect the system from the Internet. Complete all aspects of OOBE in offline mode

How to enable or disable taskbar thumbnail previews on Windows 11 How to enable or disable taskbar thumbnail previews on Windows 11 Sep 15, 2023 pm 03:57 PM

Taskbar thumbnails can be fun, but they can also be distracting or annoying. Considering how often you hover over this area, you may have inadvertently closed important windows a few times. Another disadvantage is that it uses more system resources, so if you've been looking for a way to be more resource efficient, we'll show you how to disable it. However, if your hardware specs can handle it and you like the preview, you can enable it. How to enable taskbar thumbnail preview in Windows 11? 1. Using the Settings app tap the key and click Settings. Windows click System and select About. Click Advanced system settings. Navigate to the Advanced tab and select Settings under Performance. Select "Visual Effects"

Display scaling guide on Windows 11 Display scaling guide on Windows 11 Sep 19, 2023 pm 06:45 PM

We all have different preferences when it comes to display scaling on Windows 11. Some people like big icons, some like small icons. However, we all agree that having the right scaling is important. Poor font scaling or over-scaling of images can be a real productivity killer when working, so you need to know how to customize it to get the most out of your system's capabilities. Advantages of Custom Zoom: This is a useful feature for people who have difficulty reading text on the screen. It helps you see more on the screen at one time. You can create custom extension profiles that apply only to certain monitors and applications. Can help improve the performance of low-end hardware. It gives you more control over what's on your screen. How to use Windows 11

10 Ways to Adjust Brightness on Windows 11 10 Ways to Adjust Brightness on Windows 11 Dec 18, 2023 pm 02:21 PM

Screen brightness is an integral part of using modern computing devices, especially when you look at the screen for long periods of time. It helps you reduce eye strain, improve legibility, and view content easily and efficiently. However, depending on your settings, it can sometimes be difficult to manage brightness, especially on Windows 11 with the new UI changes. If you're having trouble adjusting brightness, here are all the ways to manage brightness on Windows 11. How to Change Brightness on Windows 11 [10 Ways Explained] Single monitor users can use the following methods to adjust brightness on Windows 11. This includes desktop systems using a single monitor as well as laptops. let's start. Method 1: Use the Action Center The Action Center is accessible

How to Fix Activation Error Code 0xc004f069 in Windows Server How to Fix Activation Error Code 0xc004f069 in Windows Server Jul 22, 2023 am 09:49 AM

The activation process on Windows sometimes takes a sudden turn to display an error message containing this error code 0xc004f069. Although the activation process is online, some older systems running Windows Server may experience this issue. Go through these initial checks, and if they don't help you activate your system, jump to the main solution to resolve the issue. Workaround – close the error message and activation window. Then restart the computer. Retry the Windows activation process from scratch again. Fix 1 – Activate from Terminal Activate Windows Server Edition system from cmd terminal. Stage – 1 Check Windows Server Version You have to check which type of W you are using

See all articles