Related learning recommendations: python tutorial
pandas data processing In the sixth article of the topic, let’s talk about the sorting and summary operations of DataFrame.
In the previous article, we mainly introduced theapply method in DataFrame, how to perform broadcast operations on each row or column in a DataFrame, so that we can perform broadcast operations in a very short time Process the entire data within a certain amount of time. Today we will talk about how to sort a DataFrame according to our needs and how to use some summary operations.
based on index and sorting based on value. Let’s first take a look at the sorting method in Series.
There are two sorting methods in Series. One is sort_index. As the name implies, these values are sorted according to the index in Series. The other is sort_values, which is sorted according to the values in the Series. Both methods will return a new Series:we need to specify the axis we want to sort on, which is axis.
By default, we sort based on the row index. If we want to specify sorting based on the column index, we need to pass in the parameter axis=1.We can also pass in the ascending parameter to specify whether the sorting order we want is forward order or reverse order.
The value sorting of DataFrame is different, we Rows cannot be sorted, can only be sorted on columns . We pass in the column we want to sort by via the by parameter, which can be one column or multiple columns.
, we would like to know where the current element ranks among the whole. This function is also provided in pandas, which is the rank method.
In fact, it is very simple, because 7 appears twice, in the 6th and 7th positions respectively. Here, the ranking of all its occurrences is averaged, so it is 6.5. If we don't want it to be averaged, but
give a rankingbased on the order of appearance, we can use the method parameter to specify the effect we want.
If it is a DataFrame, the default is to calculate the overall ranking of the elements in each row in row units. We can also specify the calculation in column units through the axis parameter:
Finally, let’s introduce the summary operation in DataFrame. The summary operation is also Aggregation operation, such as our most common sum method, for one Batch data is aggregated and summed. There are similar methods in DataFrame, let’s look at them one by one.
The first is sum. We can use sum to sum the DataFrame. If no parameters are passed, the default is to sum each row.
In addition to sum, another commonly used one is mean, which can be averaged over a row or a column.
Since there are often NA elements in the DataFrame, we can use the skipna parameter to exclude missing values and then calculate the average.
Another method that I personally find very useful is descirbe, which can return the overall information in the DataFrame. For example, the mean, sample size, standard deviation, minimum value, maximum value, etc. of each column. It is a commonly used statistical method that can be used to understand the distribution of data in a DataFrame.
In addition to the methods introduced, there are many similar summary operation methods in DataFrame, such as idxmax, idxmin, var, std, etc. If you are interested, you can check the relevant documents. , but according to my experience, it is generally not used.
If you want to learn more about programming, please pay attention to the php training column!
The above is the detailed content of pandas skills sorting and summarizing methods in DataFrame. For more information, please follow other related articles on the PHP Chinese website!