How to Fill a New Column with the Output of pandas groupby().sum()
When working with data, it can be useful to create a new column in a DataFrame based on the results of a specific calculation. One such calculation is using the groupby() and sum() functions in pandas to find the sum of values for a particular group in a column. However, when trying to create a new column with this sum, it's possible to encounter NaN values.
Consider the following code:
df = pd.DataFrame({ 'Date' : ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym' : ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Data2': [11, 8, 10, 15, 110, 60, 100, 40], 'Data3': [5, 8, 6, 1, 50, 100, 60, 120] }) group = df['Data3'].groupby(df['Date']).sum() df['Data4'] = group
When running this code, you might expect to see the correct calculated values for each date (as shown in the group variable), but instead, you get NaN values in the newly created Data4 column.
To resolve this issue, you need to use the transform function. This function returns a Series with the index aligned to the DataFrame, allowing you to add it as a new column. Here's the corrected code:
df['Data4'] = df['Data3'].groupby(df['Date']).transform('sum')
With this modification, you'll successfully create a new Data4 column with the desired summed values.
The above is the detailed content of How to Avoid NaN Values When Adding pandas groupby().sum() Results to a New Column?. For more information, please follow other related articles on the PHP Chinese website!