Home > Backend Development > Python Tutorial > Use Python to display the distribution of colleges and universities across the country

Use Python to display the distribution of colleges and universities across the country

王林
Release: 2023-04-11 20:04:25
forward
1609 people have browsed it

Use Python to display the distribution of colleges and universities across the country

Data acquisition

To show the distribution of colleges and universities, you must first obtain the location data of colleges and universities across the country. The data for this article comes from the Palm College Entrance Examination Network (https://www.gaokao.cn/school/search).

Use Python to display the distribution of colleges and universities across the country

When writing this article in June 2022, a total of 2,822 colleges and universities information was obtained. After checking the data, except for a few null values, the entire data is very complete and does not affect use. The data has a total of 44 fields. This article will only use a few fields. They do not need to be processed and can be obtained on demand when using them.

Use Python to display the distribution of colleges and universities across the country

##Introduction to data acquisition methods (basic crawler knowledge):

1. Register and log in to the Palm College Entrance Examination Network. Select all schools on the page.

2. Press the F12 key, click to Network > Fetch/XHR, and then click , ## on the page several times # button, the accessed API and other information will be displayed on the XHR page.

3. Copy the API each time the page is turned for comparison. It is found that there are two parameters that change when the page is turned: page and signsafe. , page is the number of pages currently visited, signsafe is an md5 value, which cannot be decoded, but the previous values ​​can be saved and used randomly later. With this information, by constantly changing the number of pages visited and the signsafe value, all school data can be obtained.

The numFound parameter value in the Response is the total number of schools. Divide by the number of schools displayed on each page to get the total number of pages. You can also directly click

on the page to view the total number of pages. , thus determining the number of visits.

4. Because the website needs to be logged in to use, it is also necessary to obtain the Headers during access, such as Request Method (POST this time), User -Agent etc.


5. With the above information, loop through splicing the URLs of all pages, and use requests to send a request to obtain the data of all universities. Then use pandas to write the data to excel.


Warm reminder: When obtaining data, you need to comply with the relevant statements of the website. Try to set a certain time interval for the crawler code, and do not use it during peak access times. Run the crawler code regularly.

Additional explanation:

The latest announcement from People’s Daily Online: the number of general colleges and universities in the country is 2,759. This article is from the Pocket College Entrance Examination Network The difference in the obtained 2822 schools is 63, mainly due to the difference in statistical methods of branch branches of some schools. What this article shows is the distribution, and this difference has little impact.

Use Python to display the distribution of colleges and universities across the country


##Get the latitude and longitude

## The Palm College Entrance Examination Network is a website that provides volunteer services for the college entrance examination. Although the data obtained has 44 fields, it does not contain the longitude and latitude of the school. In order to better display the location of colleges and universities on the map, it is necessary to obtain the corresponding longitude and latitude based on the school's address.


This article uses Baidu Maps open platform: https://lbsyun.baidu.com/apiconsole/center#/home, you can use Baidu Maps Open interface to obtain the latitude and longitude of a geographical location.


The steps are:

1. Register and log in to a Baidu account. This account can be a common account for the entire Baidu ecosystem (such as accounts for network disks, libraries, etc. are common).

2. Log in to Baidu Map Open Platform, click to enter , then click in , and then click Create an application. Customize the application name, fill in other information as prompted and required, and conduct real-name authentication to become an individual developer.

Use Python to display the distribution of colleges and universities across the country


##3. After creating the application, you will get an application , use this AK value to call Baidu's API, the reference code is as follows.

import requests


def baidu_api(addr):
url = "http://api.map.baidu.com/geocoding/v3/?"
params = {
"address": addr,
"output": "json",
"ak": "复制你创建的应用AK到此"
}
req = requests.get(url, params)
res = req.json()
if len(res["result"]) > 0:
loc = res["result"]["location"]
return loc
else:
print("获取{}经纬度失败".format(addr))
return {'lng': '', 'lat': ''}
Copy after login

4. After successfully calling Baidu Map API, read the locations of all colleges and universities, call the above function in sequence, obtain the longitude and latitude of all colleges and universities, and rewrite it into excel.

import pandas as pd
import numpy as np


def get_lng_lat():
df = pd.read_excel('school.xlsx')
lng_lat = []
for row_index, row_data in df.iterrows():
addr = row_data['address']
if addr is np.nan:
addr = row_data['city_name'] + row_data['county_name']
# print(addr)
loc = baidu_api(addr.split(',')[0])
lng_lat.append(loc)
df['经纬度'] = lng_lat
df['经度'] = df['经纬度'].apply(lambda x: x['lng'])
df['纬度'] = df['经纬度'].apply(lambda x: x['lat'])
df.to_excel('school_lng_lat.xlsx')
Copy after login

The final data results are as follows:

Use Python to display the distribution of colleges and universities across the country

Individual developers need to use Baidu Map Open Platform Note that there is a daily quota limit, so when debugging the code, do not use all the data first, use the demo first, otherwise you will have to wait a day or purchase quota.

Use Python to display the distribution of colleges and universities across the country

##College location display

The data is ready, now display them on the map.


This article uses Baidu’s open source data visualization tool Echarts. Echarts provides the pyecharts library for the Python language, which is very convenient to use.


Installation command:

pip install pyecharts
Copy after login

1. Mark the location of the university

from pyecharts.charts import Geo
from pyecharts import options as opts
from pyecharts.globals import GeoType
import pandas as pd

def multi_location_mark():
"""批量标注点"""
geo = Geo(init_opts=opts.InitOpts(bg_color='black', width='1600px', height='900px'))
df = pd.read_excel('school_lng_lat.xlsx')
for row_index, row_data in df.iterrows():
geo.add_coordinate(row_data['name'], row_data['经度'], row_data['纬度'])
data_pair = [(name, 2) for name in df['name']]
geo.add_schema(
maptype='china', is_roam=True, itemstyle_opts=opts.ItemStyleOpts(color='#323c48', border_color='#408080')
).add(
'', data_pair=data_pair, type_=GeoType.SCATTER, symbol='pin', symbol_size=16, color='#CC3300'
).set_series_opts(
label_opts=opts.LabelOpts(is_show=False)
).set_global_opts(
title_opts=opts.TitleOpts(title='全国高校位置标注图', pos_left='650', pos_top='20',
title_textstyle_opts=opts.TextStyleOpts(color='white', font_size=16))
).render('high_school_mark.html')
Copy after login

Use Python to display the distribution of colleges and universities across the country

Judging from the annotation results, colleges and universities are mainly distributed along the coast, central and eastern areas, with relatively few in the west, especially in high-altitude areas.

2. Draw a heat map of the distribution of colleges and universities

from pyecharts.charts import Geo
from pyecharts import options as opts
from pyecharts.globals import ChartType
import pandas as pd

def draw_location_heatmap():
"""绘制热力图"""
geo = Geo(init_opts=opts.InitOpts(bg_color='black', width='1600px', height='900px'))
df = pd.read_excel('school_lng_lat.xlsx')
for row_index, row_data in df.iterrows():
geo.add_coordinate(row_data['name'], row_data['经度'], row_data['纬度'])
data_pair = [(name, 2) for name in df['name']]
geo.add_schema(
maptype='china', is_roam=True, itemstyle_opts=opts.ItemStyleOpts(color='#323c48', border_color='#408080')
).add(
'', data_pair=data_pair, type_=ChartType.HEATMAP
).set_series_opts(
label_opts=opts.LabelOpts(is_show=False)
).set_global_opts(
title_opts=opts.TitleOpts(title='全国高校分布热力图', pos_left='650', pos_top='20',
title_textstyle_opts=opts.TextStyleOpts(color='white', font_size=16)),
visualmap_opts=opts.VisualMapOpts()
).render('high_school_heatmap.html')
Copy after login

Use Python to display the distribution of colleges and universities across the country

From the heat map You see, the places where universities are concentrated are mainly along the coast, Beijing, Shanghai, Guangzhou, and the Yangtze and Yellow River basins, while Sichuan and Chongqing are the only places with more universities in the west.


3. Draw distribution density map by province

from pyecharts.charts import Map
from pyecharts import options as opts
import pandas as pd


def draw_location_density_map():
"""绘制各省高校分布密度图"""
map = Map(init_opts=opts.InitOpts(bg_color='black', width='1200px', height='700px'))
df = pd.read_excel('school_lng_lat.xlsx')
s = df['province_name'].value_counts()
data_pair = [[province, int(s[province])] for province in s.index]
map.add(
'', data_pair=data_pair, maptype="china"
).set_global_opts(
title_opts=opts.TitleOpts(title='全国高校按省分布密度图', pos_left='500', pos_top='70',
title_textstyle_opts=opts.TextStyleOpts(color='white', font_size=16)),
visualmap_opts=opts.VisualMapOpts(max_=200, is_piecewise=True, pos_left='100', pos_bottom='100',textstyle_opts=opts.TextStyleOpts(color='white', font_size=16))
).render("high_school_density.html")
Copy after login

Use Python to display the distribution of colleges and universities across the country

It can be seen from the provincial distribution density map that provinces with a large number of universities are concentrated in the central and eastern parts of the country, especially in several provinces near Beijing and Shanghai.

4. Distribution of 211 and 985 colleges and universities

Filter out the data of 211 and 985 colleges and universities and draw it again. (The code does not need to be pasted repeatedly, just add a line of filtering code)


Use Python to display the distribution of colleges and universities across the country

The above is the entire article content.

The above is the detailed content of Use Python to display the distribution of colleges and universities across the country. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template