Comment créer un DataFrame Pandas à partir d'un fichier texte avec un modèle spécifique ?-Tutoriel Python-php.cn

Comment créer un DataFrame Pandas à partir d'un fichier texte avec un modèle spécifique ?

Mary-Kate Olsen

Libérer： 2024-11-03 09:20:02

original

345 Les gens l'ont consulté

How to Create a Pandas DataFrame from a Text File with a Specific Pattern?

Comment créer un DataFrame Pandas à partir d'un fichier txt avec un modèle spécifique

Problème : Vous disposez d'un fichier texte avec une structure spécifique et vous devez créer un Pandas DataFrame basé sur le modèle suivant :

Alabama[edit]
Auburn (Auburn University)[1]
Florence (University of North Alabama)
Jacksonville (Jacksonville State University)[2]
Livingston (University of West Alabama)[2]
Montevallo (University of Montevallo)[2]
Troy (Troy University)[2]
Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4]
Tuskegee (Tuskegee University)[5]
...

<State>[edit]
<Region Name 1>
<Region Name 2>
...

Copier après la connexion

Les noms d'état doivent être répétés pour chaque nom de région.

Solution :

<code class="python">import pandas as pd

# Read the text file into a DataFrame with the column name 'Region Name'
df = pd.read_csv('filename.txt', sep=";", names=['Region Name'])

# Extract the state names from the rows containing '[edit]'
state_names = df[df['Region Name'].str.contains('\[edit\]')]['Region Name']

# Replace the region names with state names in the rows where the region name contains '[edit]'
df['Region Name'] = df['Region Name'].str.replace('\[edit\]', state_names)

# Replace the region names with state names in the rows where the region name contains '[number]' or '[characters]'
df['Region Name'] = df['Region Name'].str.replace(' \(.+$', '')

# Insert a new column 'State' with the state name for each region name
df.insert(0, 'State', df['Region Name'].ffill())

# Drop the rows where the region name contains '[edit]' leaving the columns State and Region Name
df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)

print(df)</code>

Copier après la connexion

Le DataFrame résultant aura le résultat suivant :

      State   Region Name
0   Alabama        Auburn
1   Alabama      Florence
2   Alabama  Jacksonville
3   Alabama    Livingston
4   Alabama    Montevallo
5   Alabama          Troy
6   Alabama    Tuscaloosa
7   Alabama      Tuskegee
8    Alaska     Fairbanks
9   Arizona     Flagstaff
10  Arizona         Tempe
11  Arizona        Tucson

Copier après la connexion

Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!