How to Read Data Directly from a URL Using Pandas?

DDD
Release: 2024-11-04 10:40:30
Original
356 people have browsed it

How to Read Data Directly from a URL Using Pandas?

The Read-All-URL Conundrum

One common task in data analysis is to load data from a URL. Pandas, a popular Python library for data manipulation, provides a read_csv function that allows one to read data from a CSV file located in a file path or as a file-like object. However, attempting to directly pass a URL to read_csv may result in an error.

Understanding the Error

To demonstrate this error, let's consider the example provided in the question:

<code class="python">import pandas as pd
import requests

url = "https://github.com/cs109/2014_data/blob/master/countries.csv"
s = requests.get(url).content
c = pd.read_csv(s)</code>
Copy after login

This code attempts to retrieve the CSV file from the given URL using the requests library and then pass the retrieved content as a file-like object to read_csv. However, this will raise an error:

Expected file path name or file-like object, got <class 'bytes'> type
Copy after login

Resolving the Issue

To resolve this error, we need to ensure that we pass a file-like object to read_csv. In Python, there are two main types of file-like objects: text files and binary files. The example provided in the question passes a byte array retrieved from the URL, which is a binary file. Read_csv expects a text file object, which can be obtained by decoding the byte array:

<code class="python">import pandas as pd

url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c = pd.read_csv(url, encoding="utf-8")</code>
Copy after login

By specifying the encoding as "utf-8," we are interpreting the byte array as a text file. This allows read_csv to successfully load the data from the URL.

Improved Simplicity with Pandas 0.19.2

In the latest version of pandas (0.19.2), there is a simpler solution available. Pandas now allows direct reading from URLs:

<code class="python">import pandas as pd

url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c = pd.read_csv(url)</code>
Copy after login

This eliminates the need for additional operations such as retrieving the content and decoding it, making the process more straightforward.

The above is the detailed content of How to Read Data Directly from a URL Using Pandas?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!