This article will talk about a little trick on using the Python Pandas library, and introduce the elegant query method using query(). I hope it will be helpful to everyone!
For Pandas to obtain specified data based on conditions, I believe everyone can easily write the corresponding code, but if you have not used query, I believe you will be impressed by its simplicity. Impressed!
Create a DataFrame first.
import pandas as pd df = pd.DataFrame( {'A': ['e', 'd', 'c', 'b', 'a'], 'B': ['f', 'b', 'c', 'd', 'e'], 'C': range(0, 10, 2), 'D': range(10, 0, -2), 'E.E': range(10, 5, -1)})
We now select all rows where letters in column A appear in column B. Let’s look at two common ways of writing first.
>>> df[df['A'].isin(df['B'])] A B C D E.E 0 e f 0 10 10 1 d b 2 8 9 2 c c 4 6 8 3 b d 6 4 7 >>> df.loc[df['A'].isin(df['B'])] A B C D E.E 0 e f 0 10 10 1 d b 2 8 9 2 c c 4 6 8 3 b d 6 4 7
Use query()
below to achieve this.
>>> df.query("A in B") A B C D E.E 0 e f 0 10 10 1 d b 2 8 9 2 c c 4 6 8 3 b d 6 4 7
You can see that the code after using query
is concise and easy to understand, and it consumes less memory.
Multi-condition query
Select all letters in column A that appear in column B, and column C is less than column D OK.
>>> df.query('A in B and C < D') A B C D E.E 0 e f 0 10 10 1 d b 2 8 9 2 c c 4 6 8
and can also be represented by
&.
Reference variables
Externally defined variables can also be used in expressions, marked with @ before the variable name.>>> number = 5 >>> df.query('A in B & C > @number') A B C D E.E 3 b d 6 4 7
Index selection
Select all rows where the letters in column A appear in column B and the index is greater than 2. >>> df.query('A in B and index > 2')
A B C D E.E
3 b d 6 4 7
Create a two-level index DataFrame.
>>> import numpy as np >>> colors = ['yellow']*3 + ['red']*2 >>> rank = [str(i) for i in range(5)] >>> index = pd.MultiIndex.from_arrays([colors, rank], names=['color', 'rank']) >>> df = pd.DataFrame(np.arange(10).reshape(5, 2),columns=['A', 'B'] , index=index) >>> df = pd.DataFrame(np.arange(10).reshape(5, 2),columns=['A', 'B'] , index=index) >>> df A B color rank yellow 0 0 1 1 2 3 2 4 5 red 3 6 7 4 8 9
1. When there are multiple levels of indexes with names, select directly through the index name.
>>> df.query("color == 'red'") A B color rank red 3 6 7 4 8 9
2. When there are multiple layers of unnamed indexes, select by index level.
>>> df.index.names = [None, None] >>> df.query("ilevel_0 == 'red'") A B red 3 6 7 4 8 9 >>> df.query("ilevel_1 == '4'") A B red 4 8 9
Special charactersFor column names with spaces or other special symbols such as operators in the middle, you need to use backticks
``. 【Related recommendations: The above is the detailed content of Learn how to use query() in Python to perform elegant queries in one article. For more information, please follow other related articles on the PHP Chinese website!>>> df.query('A == B | (C + 2 > `E.E`)')
A B C D E.E
2 c c 4 6 8
3 b d 6 4 7
4 a e 8 2 6