When attempting to create a new column based on the summation of a value grouped by date using pandas' groupby(), NaN results are encountered. The objective is to add a column that displays the total sum of a specific value for all dates, regardless of the number of rows associated with that date.
To achieve this, the transform() function is employed. Unlike the apply() function, which operates row-by-row, transform() performs computations on grouped data and returns a series aligned with the original dataframe.
df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
Here's a step-by-step breakdown:
Consider the following dataframe:
Date Sym Data2 Data3 0 2015-05-08 aapl 11 5 1 2015-05-07 aapl 8 8 2 2015-05-06 aapl 10 6 3 2015-05-05 aapl 15 1 4 2015-05-08 aaww 110 50 5 2015-05-07 aaww 60 100 6 2015-05-06 aaww 100 60 7 2015-05-05 aaww 40 120
Applying the transform() function:
df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
Results in:
Date Sym Data2 Data3 Data4 0 2015-05-08 aapl 11 5 55 1 2015-05-07 aapl 8 8 108 2 2015-05-06 aapl 10 6 66 3 2015-05-05 aapl 15 1 121 4 2015-05-08 aaww 110 50 55 5 2015-05-07 aaww 60 100 108 6 2015-05-06 aaww 100 60 66 7 2015-05-05 aaww 40 120 121
As evident from the output, the 'Data4' column now holds the sum of 'Data3' for each unique 'Date' value.
The above is the detailed content of How to Add a New Column with Grouped Summation in Pandas Using `transform()`?. For more information, please follow other related articles on the PHP Chinese website!