How to Efficiently Drop Consecutive Duplicates in Pandas?-Python Tutorial-php.cn

How to Efficiently Drop Consecutive Duplicates in Pandas?

Mary-Kate Olsen

Release： 2024-11-13 17:29:02

Original

736 people have browsed it

How to Efficiently Drop Consecutive Duplicates in Pandas?

Efficient Dropping of Consecutive Duplicates in Pandas

When working with pandas DataFrames, it's often necessary to remove duplicate values. The built-in drop_duplicates() method, however, removes all instances of duplicate values, including consecutive duplicates. For cases where only consecutive duplicates need to be dropped, there are more efficient methods available.

One approach involves using the shift() function. By comparing the DataFrame against its shifted version (a.shift(-1)), a boolean mask can be created that identifies where consecutive duplicates occur. This mask can then be used to select only the unique values, as seen in the following example:

a.loc[a.shift(-1) != a]

Copy after login

Another method utilizes the diff() function. It calculates the difference between rows and can be used to identify consecutive duplicates. However, it's slower than the shift() method for large datasets.

Using:

a.loc[a.diff() != 0]

Copy after login

The original answer suggested using shift() with a period of -1, but the correct usage is shift(1) (or simply shift()) since the default shift period is 1. This modification ensures that only the first consecutive value is returned:

a.loc[a.shift(1) != a]

Copy after login

Both the shift() and diff() methods provide efficient ways to drop consecutive duplicates in Pandas and should be considered based on the specific context and performance requirements.

The above is the detailed content of How to Efficiently Drop Consecutive Duplicates in Pandas?. For more information, please follow other related articles on the PHP Chinese website!