Pandas Rules for View vs Copy Generation
Pandas employs specific rules when deciding whether a slice operation on a DataFrame results in a view or a copy. By understanding these rules, you can optimize your data manipulation and avoid unexpected behavior.
Starting with operations that always generate copies:
- All operations, except those that are specifically designed to modify the DataFrame in-place, create copies.
- Only certain operations support the inplace=True parameter, which allows modifications to occur directly in the original DataFrame.
Next, let's consider operations that may result in views:
- An indexer that sets values, such as .loc, .iloc, .iat, and .at, operates in-place, modifying the original DataFrame without creating a copy.
- An indexer that retrieves data from a single-dtype object usually creates a view, unless the underlying memory layout precludes this optimization.
- Conversely, an indexer that retrieves data from a multiple-dtype object always creates a copy.
Regarding your examples:
- df.query('2 < index <= 5') returns a copy because it involves a numerical expression evaluation.
- df.iloc[3] = 70 and df.ix[1, 'B':'E'] = 222 change df because they access single-dtype objects and set values in-place.
- df[df.C <= df.B] modifies df because it uses an in-place setter (df[...]) on a single-dtype object (the resulting boolean mask).
- However, df[df.C <= df.B].ix[:,'B':'E'] does not modify df because it involves a chained indexing operation, which is not guaranteed to be intercepted by Pandas.
To modify specific values based on a query, use the correct loc syntax:
df.loc[df.C <= df.B, 'B':'E']
Copy after login
By adhering to these rules, you can gain a clear understanding of when Pandas generates views or copies, ensuring efficient data manipulation within your Python scripts.
The above is the detailed content of When Does Pandas Create a View vs a Copy?. For more information, please follow other related articles on the PHP Chinese website!