
Aggregation in Pandas
With Pandas, you can perform various aggregation operations to reduce the dimensionality and summarize data.
Question 1: How can I perform aggregation with Pandas?
Pandas provides many aggregating functions, including mean(), sum(), count(), min(), and max(). You can use these functions to calculate summary statistics for each group. For example:
1 2 3 4 5 | # Calculate mean of each group based on 'A' and 'B' columns
df1 = df.groupby([ 'A' , 'B' ]).mean()
# Print the results
print (df1)
|
Copy after login
Question 2: No DataFrame after aggregation! What happened?
When you apply aggregation to multiple columns, the resulting object can be a Series or DataFrame depending on the number of columns grouped.
-
Series: If you group by one or more columns, the result is a Series with an index corresponding to the groups.
-
DataFrame: If you group by only one column, the result is a DataFrame with columns corresponding to the original columns.
To get a DataFrame with all the columns, use as_index=False in the groupby function.
Question 3: How can I aggregate mainly strings columns (to lists, tuples, strings with separator)?
To aggregate strings columns, you can use list, tuple, or join operations.
-
List: Convert the column to a list using list() or GroupBy.apply(list).
-
Tuple: Convert the column to a tuple using tuple() or GroupBy.apply(tuple).
-
String with separator: Combine the strings with a separator using str.join().
For example:
1 2 3 4 5 | # Convert 'B' column values to a list for each group
df1 = df.groupby( 'A' )[ 'B' ].agg(list).reset_index()
# Combine 'B' column values into a string with separator for each group
df2 = df.groupby( 'A' )[ 'B' ].agg( ',' .join).reset_index()
|
Copy after login
Question 4: How can I aggregate counts?
To count non-missing values in each group, use GroupBy.count(). To count all values, including missing ones, use GroupBy.size().
For example:
1 2 3 4 5 | # Count non-missing values in 'C' column for each group
df1 = df.groupby( 'A' )[ 'C' ]. count ().reset_index(name= 'COUNT' )
# Count all values in 'A' column for each group
df2 = df.groupby( 'A' ).size().reset_index(name= 'COUNT' )
|
Copy after login
Question 5: How can I create a new column filled by aggregated values?
You can add a new column containing the aggregated values using the transform() method. The transform() function applies the specified operation to each group and returns a new object with the same size as the original one.
For example:
1 2 | # Create a new 'C1' column with the sum of 'C' grouped by 'A'
df[ 'C1' ] = df.groupby( 'A' )[ 'C' ].transform( 'sum' )
|
Copy after login
The above is the detailed content of How to Perform Data Aggregation with Pandas?. For more information, please follow other related articles on the PHP Chinese website!