포트폴리오의 지리와 비교기(점 A 및 점 B) 사이의 거리(시간 및 마일)를 계산하는 방법-파이썬 튜토리얼-php.cn

Python을 사용하여 지리적 거리 계산

이 코드는 두 지점 사이의 운전 시간과 거리를 계산하기 위해 지리적 데이터가 포함된 포트폴리오를 다른 지역과 연결하려는 모든 사람에게 유용합니다. 이는 지원자의 지리적 분포를 쿼리한 후 승인된 프로젝트가 서로 얼마나 가까운지 자금 제공자가 이해할 수 있도록 돕기 위해 제가 할당한 작업에서 영감을 받았습니다.

이 기사에서는 API 호출, 내장 및 맞춤 기능을 사용하여 자선단체 목록(A지점)을 가장 가까운 기차역(B지점)과 일치시키고 거리를 마일 단위로 계산하고 운전하는 방법을 안내합니다. 분 단위의 시간입니다.

기타 사용 사례는 다음과 같습니다.

가장 가까운 학교 우편번호 일치
가장 가까운 자선단체의 우편번호 일치
가장 가까운 NHS 제공업체와 우편번호 일치
가장 가까운 국립공원의 우편번호 일치
목록 A의 우편번호를 목록 B의 가장 가까운 우편번호와 일치

요구사항

패키지:

판다
엉망
요청
json
하버사인

이 기사에 사용된 리소스:

자선 데이터(이 예에서는 자선 커미션 등록부에서 지출이 5백만 이상인 자선 단체를 추출하여 처음 100개 자선 단체를 선택했습니다.)
영국 기차역 데이터(쉽게 사용할 수 없으므로 영국 기차역과 경도, 위도 및 우편번호가 포함된 github 문서를 사용했습니다)
Postcodes.io(영국 우편번호 데이터를 검색하고 추출하는 API)
Project OSRM(경로 계산 API)

왜 Python을 사용하나요?

여기서 논의한 단계는 복잡해 보일 수 있지만 최종 결과는 여러 행의 데이터에 대해 지점 A와 지점 B 사이의 지리적 거리를 계산할 때 필요에 맞게 재사용하고 형식을 다시 지정할 수 있는 템플릿입니다.

예를 들어 100개의 자선단체와 협력하고 있다고 가정해 보겠습니다. 이러한 자선단체의 지리적 위치에 대한 광범위한 분석의 일환으로 이러한 자선단체가 인근 기차역과 얼마나 가까운지 알고 싶습니다. 이 데이터를 시각적으로 매핑하고 싶을 수도 있고, 먼 곳에서 자선 행사에 참석할 수 있는 접근성을 조사하는 등 추가 분석을 위한 출발점으로 사용할 수도 있습니다.

사용 사례가 무엇이든 수동으로 수행하려는 경우 단계는 다음과 같습니다.

자선단체 우편번호 찾기
온라인 도구를 사용하여 자선 단체에 가장 가까운 역을 확인하세요
온라인 지도 도구를 사용하여 자선단체에서 가장 가까운 역까지의 거리(마일)와 운전 시간을 알아보세요
스프레드시트에 결과 기록
나머지 99개 자선단체에 대해 1~4단계를 반복합니다

이는 소수의 자선단체에 효과적일 수 있지만 시간이 지나면 그 과정은 시간이 많이 걸리고 지루하며 인적 오류가 발생하기 쉽습니다.

Python을 사용하여 이 작업을 완료하면 단계를 자동화할 수 있으며 사용자가 요구하는 몇 가지 추가 사항만 추가하면 마지막에 코드를 실행하기만 하면 됩니다.

파이썬으로 무엇을 할 수 있나요?

작업을 단계별로 나누어 보겠습니다. 여기서 필요한 단계는 다음과 같습니다.

우편번호로 가장 가까운 역을 찾아보세요
둘 사이의 거리를 계산하세요
여행을 위한 운전 시간 계산
필요한 모든 정보가 포함된 데이터 세트 생성

1단계를 완료하기 위해 Python을 사용하여 다음을 수행합니다.

우편번호를 포함하여 자선단체의 세부정보가 포함된 데이터세트를 가져옵니다
Postcodes.io API를 사용하여 각 우편번호의 경도와 위도를 추출합니다
이 정보를 원래 정보와 각 자선단체의 경도 및 위도가 포함된 데이터 프레임으로 다시 컴파일합니다.

1단계: 해당 우편번호에서 가장 가까운 역 찾기

1- 패키지 가져오기


# data manipulation
import numpy as np
import pandas as pd

# http requests
import requests

# handling json
import json

# calculating distances
import haversine as hs
from haversine import haversine, Unit

로그인 후 복사

2 - 데이터 가져오기 및 정리


# import as a pandas dataframe, specifying which columns to import
charities = pd.read_excel('charity_list.xlsx', usecols='A, C, E')
stations = pd.read_csv('uk-train-stations.csv', usecols=[1,2,3])

# renaming stations columns for ease of use
stations = stations.rename(columns={'station_name':'Station Name','latitude':'Station Latitude', 'longitude':'Station Longitude'})

로그인 후 복사

'charities'라는 자선 데이터 세트가 포함된 변수는 추출한 데이터와 병합할 때 사용할 마스터 데이터 프레임이 됩니다.

현재 다음 단계는 자선단체 우편번호의 경도와 위도를 추출하는 함수를 만드는 것입니다.
3 - 일치 기능을 위해 우편번호를 목록으로 변환
charities_pc = charities['Charity Postcode'].tolist()
로그인 후 복사
4 - 우편번호를 가져와 postcodes.io에 요청하고 위도와 경도를 기록하고 데이터를 새 데이터프레임에 반환하는 함수를 만듭니다.

자세한 내용은 postcodes.io 문서를 참조하세요
def bulk_pc_lookup(postcodes):

    # set up the api request
    url = "https://api.postcodes.io/postcodes"
    headers = {"Content-Type": "application/json"}

    # specify our input data and response, specifying that we are working with data in json format
    data = {"postcodes": postcodes}
    response = requests.post(url, headers=headers, data=json.dumps(data))

    # specify the information we want to extract from the api response

    if response.status_code == 200:
        results = response.json()["result"]
        postcode_data = []

        for result in results:
            postcode = result["query"]

            if result["result"] is not None:
                latitude = result["result"]["latitude"]
                longitude = result["result"]["longitude"]
                postcode_data.append({"Charity Postcode": postcode, "Latitude": latitude, "Longitude": longitude})

        return postcode_data

    # setting up a fail safe to capture any errors or results not found
    else:
        print(f"Error: {response.status_code}")
        return []
로그인 후 복사
5 - 자선 우편번호 목록을 함수에 전달하여 원하는 결과를 추출합니다
# specify where the postcodes are
postcodes = charities_pc

# save the results of the function as output
output = bulk_pc_lookup(postcodes)

# convert the results to a pandas dataframe
output_df = pd.DataFrame(output)
output_df.head()
로그인 후 복사
참고:

if your Point B data (in this case, the UK rail stations) does not already contain latitude and longitude, you will need to also performs steps 3 to 5 on the Point B data as well

postcodes.io allows bulk look up requests for up to 100 postcodes at a time. if your dataset contains more than 100 postcodes, you will need to either manually create new excel sheets containing only 100 rows per sheet, or you will need to write a function to break your dataset into the required length for the API call

6 - we can now either merge our output_df with our original charity dataset, or, to leave our original data untouched, create a new dataframe that we will use for the rest of the project for our extracted results
charities_output = pd.merge(charities, output_df, on="Charity Postcode")

charities_output.head()
로그인 후 복사
Step 1 Complete

We now have two dataframes which we will use for the next steps:

Our original stations dataframe containing the UK train stations latitude and longitude

Our new charities_output dataframe containing the original charity information and the new latitude and longitude information extracted from our API call

Step 2 - Calculate the distance between Point A (charity) and Point B (train station), and record the nearest result for Point A

In this section, we will be using the haversine distance formula to:

check the distance between a charity and every UK train station

match the nearest result i.e. the UK train station with the minimum distance from our charity

loop over our charities dataset to find the nearest match for each row

record our results in a dataframe

Please note, for further information on using the haversine module, consult the documentation

1 - create a function for calculating the distance between Point A and Point B
def calc_distance(lat1, lon1, lat2, lon2):

    # specify data for location one, i.e. Point A
    loc1 = (lat1, lon1)

    # specify the data for location two, i.e. Point B
    loc2 = (lat2, lon2)

    # calculate the distance and specify the units as miles
    dist = haversine(loc1, loc2, unit=Unit.MILES)

    return dist
로그인 후 복사
2 - create a loop that calculates the distance between Point A and every row in Point B, and match the result where Point B is nearest to Point A
# create an empty dictionary to store the results
results = {}

# begin with looping over the dataset containing the data for Point A
for index1, row1 in charities_output.iterrows():

    # specify the location of our data
    charity_name = row1['Charity Name']
    lat1 = row1['Latitude']
    lon1 = row1['Longitude']

    # track the minimum distance between Point A and every row of Point B
    min_dist = float('inf')
    # as the minimum distance i.e. nearest Point B is not yet known, create an empty string for storage
    min_station = ''

    # loop over the dataset containing the data for Point B
    for index2, row2 in stations.iterrows():

        # specify the location of our data
        lat2 = row2['Station Latitude']
        lon2 = row2['Station Longitude']

        # use our previously created distance function to calculate the distance
        dist = calc_distance(lat1, lon1, lat2, lon2)

        # check each distance - if it is lower than the last, this is the new low. this will repeat until the lowest distance is found
        if dist < min_dist:
            min_dist = dist
            min_station = row2['Station Name']

    results[charity_name] = {'Nearest Station': min_station, 'Distance (Miles)': min_dist}

# convert the results dictionary into a dataframe
res = pd.DataFrame.from_dict(results, orient="index")

res.head()
로그인 후 복사
3 - merge our new information with our charities_output dataframe
# as our dataframe output has used our charities as an index, we need to re-add it as a column
res['Charity Name'] = res.index

# merging with our existing output dataframe
charities_output = charities_output.merge(res, on="Charity Name")

charities_output.head()
로그인 후 복사
Step 2 Complete

We now have all our information in one place, charities_output, containing:

Our charity information

The nearest station to each charity

The distance in miles

Step 3 - Calculate the driving time for travel

Our final step uses Project OSRM to find the driving distance between each of our charities and its nearest station. This is helpful as miles are not always an accurate descriptor of distance, where, for example, in a city like London, a 1 mile journey might take as long as a 5 mile journey in a rural area.

To prepare for this step, we must have one dataframe containing the following information:

charity information: name, longitude, latitude, nearest station, distance in miles

station information: name, longtiude, latitude

1- create a data frame with the above information
drive_time_df = pd.merge(charities_output, stations, left_on='Nearest Station', right_on='Station Name')
drive_time_df = drive_time_df.drop(columns=['Station Name'])

drive_time_df.head()
로그인 후 복사
2 - now that our dataframe is ready, we can set up our function for calculating drive time using Project OSRM

please note: for further information, consult the documentation
url = "http://router.project-osrm.org/route/v1/driving/{lon1},{lat1};{lon2},{lat2}"

# function 

def calc_driveTime(row):

    # extract lat and lon
    lat1, lon1 = row['Latitude'], row['Longitude']
    lat2, lon2 = row['Station Latitude'], row['Station Longitude']

    # request
    response = requests.get(url.format(lat1=lat1, lon1=lon1, lat2=lat2, lon2=lon2))

    # parse response
    data = json.loads(response.content)

    # drive time in seconds
    drive_time_sec = data["routes"][0]["duration"]

    # convert to minutes
    drive_time = round((drive_time_sec) / 60, 0)

    return drive_time
로그인 후 복사
3 - pass our data into our new function to calculate driving time in minutes
# apply the above function to our dataframe
driving_time_res = drive_time_df.apply(calc_driveTime, axis=1)

# add dataframe results as a new column
drive_time_df['Driving Time (Minutes)'] = driving_time_res

drive_time_df.head()
로그인 후 복사
Step 4 Complete

We now have all our desired information in one compact dataframe. For layout purposes, and depending on what we want to do next with our data, we can create one final dataframe as output, containing the following information:

Charity Name

Nearest Station

Distance (Miles)

Driving Time (Minutes)
final_output = drive_time_df.drop(columns=['Charity Number', 'Charity Postcode', 'Latitude', 'Longitude', 'Station Latitude', 'Station Longitude'])

final_output.head()
로그인 후 복사
Thankyou for reading! I hope this was helpful. Please checkout my website if you are interested in my work.