How to Extract Tuples from Pandas Dataframe Columns
Problem:
In a Pandas dataframe, it is common to have columns containing tuples. However, working with these tuples can be cumbersome. To facilitate analysis, it is often desirable to split these columns into multiple columns containing the individual tuple elements.
Solution:
To convert a column of tuples into separate columns, follow these steps:
Convert the column to a list of tuples using the tolist() method:
<code class="python">column_list = column.tolist()</code>
Create a new dataframe from the list of tuples:
<code class="python">new_df = pd.DataFrame(column_list, index=dataframe.index)</code>
Assign the new dataframe as new columns to the original dataframe:
<code class="python">dataframe[['column_a', 'column_b']] = new_df[['0', '1']]</code>
Example:
Consider the following dataframe:
<code class="python">>>> d1 y norm test y norm train len(y_train) len(y_test) \ 0 64.904368 116.151232 1645 549 1 70.852681 112.639876 1645 549 SVR RBF \ 0 (35.652207342877873, 22.95533537448393) 1 (39.563683797747622, 27.382483096332511) LCV \ 0 (19.365430594452338, 13.880062435173587) 1 (19.099614489458364, 14.018867136617146) RIDGE CV \ 0 (4.2907610988480362, 12.416745648065584) 1 (4.18864306788194, 12.980833914392477) RF \ 0 (9.9484841581029428, 16.46902345373697) 1 (10.139848213735391, 16.282141345406522) GB \ 0 (0.012816232716538605, 15.950164822266007) 1 (0.012814519804493328, 15.305745202851712) ET DATA 0 (0.00034337162272515505, 16.284800366214057) j2m 1 (0.00024811554516431878, 15.556506191784194) j2m >>></code>
To split the LCV column into individual columns LCV-a and LCV-b, you can use the following code:
<code class="python">df[['LCV-a', 'LCV-b']] = pd.DataFrame(df['LCV'].tolist(), index=df.index)</code>
The resulting dataframe will be:
<code class="python">>>> df y norm test y norm train len(y_train) len(y_test) \ 0 64.904368 116.151232 1645 549 1 70.852681 112.639876 1645 549 SVR RBF \ 0 (35.652207342877873, 22.95533537448393) 1 (39.563683797747622, 27.382483096332511) LCV-a LCV-b \ 0 19.365430594452338 13.880062435173587 1 19.099614489458364 14.018867136617146 RIDGE CV \ 0 (4.2907610988480362, 12.416745648065584) 1 (4.18864306788194, 12.980833914392477) RF \ 0 (9.9484841581029428, 16.46902345373697) 1 (10.139848213735391, 16.282141345406522) GB \ 0 (0.012816232716538605, 15.950164822266007) 1 (0.012814519804493328, 15.305745202851712) ET DATA 0 (0.00034337162272515505, 16.284800366214057) j2m 1 (0.00024811554516431878, 15.556506191784194) j2m</code>
The above is the detailed content of How to split a column of tuples into separate columns in a Pandas dataframe?. For more information, please follow other related articles on the PHP Chinese website!