Within a Pandas DataFrame, combining data from multiple rows for a given set of columns can be a common task. In this article, we'll address the query of calculating the sum of specific columns within DataFrame rows.
Initial Approach and Error:
One might attempt to use the following code to achieve the sum of columns 'a', 'b,' and 'd':
df['e'] = df[['a', 'b', 'd']].map(sum)
However, this approach fails due to the presence of non-numeric data in the 'c' column.
Correct Operation:
To account for non-numeric data and accurately sum the desired columns, we modify the code as follows:
df['e'] = df.sum(axis=1, numeric_only=True)
Explanation:
The sum function is invoked with axis=1 to sum rows rather than columns. Additionally, numeric_only=True ensures that only numeric columns are considered in the calculation, excluding non-numeric columns such as 'c'.
Sum Specific Columns:
To sum only a subset of columns, create a list of the desired columns and exclude those you don't need:
col_list.remove('d') df['e'] = df[col_list].sum(axis=1)
This operation would sum the 'a,' 'b,' and 'c' columns, storing the result in the 'e' column.
The above is the detailed content of How to Sum Specific Columns in a Pandas DataFrame While Handling Non-Numeric Data?. For more information, please follow other related articles on the PHP Chinese website!