The following is an example of Python deduplication of multi-attribute duplicate data. It has a good reference value and I hope it will be helpful to everyone. Let’s take a look together
Steps to deduplicate data in the pandas module in python:
1) Use the duplicated method in DataFrame Returns a Boolean Series to display whether there are duplicate rows in each row. No duplicate rows are displayed as FALSE, and duplicate rows are displayed as TRUE;
2) Use the drop_duplicates method in the DataFrame to return a removed DataFrame with repeated rows.
Note:
If no parameters are set in the duplicated method and drop_duplicates method, these two methods will judge all by default. If in These two methods add specified attribute names (or column names), for example: frame.drop_duplicates(['state']), then specify some columns (state columns) to determine duplicates.
Specific examples are as follows:
>>> import pandas as pd >>> data={'state':[1,1,2,2],'pop':['a','b','c','d']} >>> frame=pd.DataFrame(data) >>> frame pop state 0 a 1 1 b 1 2 c 2 3 d 2 >>> IsDuplicated=frame.duplicated() >>> print IsDuplicated 0 False 1 False 2 False 3 False dtype: bool >>> frame=frame.drop_duplicates(['state']) >>> frame pop state 0 a 1 2 c 2 >>> IsDuplicated=frame.duplicated(['state']) >>> print IsDuplicated 0 False 2 False dtype: bool >>>
##
The above is the detailed content of Python deduplicates multi-attribute duplicate data. For more information, please follow other related articles on the PHP Chinese website!