Related learning recommendations: python video tutorial
pandas data processingThe fifth article of the topic, let’s talk about some advanced operations of pandas.
In the previous article, we introduced some calculation methods of panads, such as the four arithmetic operations of two dataframes, and the method of filling Null in the dataframe. In today's article, we will talk about the broadcast mechanism in dataframe and how to use the apply function.Broadcast mechanismWe are actually not unfamiliar with it. We have introduced broadcasting in our previous special article introducing numpy. When we operate on two arrays with inconsistent sizes, the system will automatically fill the one with the smaller dimension to be the same as the other one before performing calculations.
For example, if we subtract a one-dimensional array from a two-dimensional array, numpy will first expand the one-bit array to two dimensions and then perform the subtraction operation. It looks like each row of the two-dimensional array is subtracted from this one-dimensional array. It can be understood that we broadcast the operation of subtracting this one-dimensional array to each row or column of the two-dimensional array. In the above example we created a numpy array and then subtracted its first row. When we compare the final results, we will find that each row in the arr array has its first row subtracted. The same operation can also be performed on dataframe.Of course we can also broadcast to a certain column, but the broadcast mechanism of the four arithmetic operations of the dataframe takes effect on the row by default. If we want to use it on the column, we need to use the arithmetic operation method and specify The axis you wish to match.
Another advantage of pandas is that is compatible with some calculation methods and functions in numpy, so that we can also use some numpy functions on DataFrame, which greatly expands the usage methods and Operation method.
For example, if we want to change all the elements in the DataFrame into its square, we can easily do it using numpy's square method:
We can The DataFrame is passed in as a parameter of the numpy function, but what if we want to define a method ourselves and apply it to the DataFrame?
We can easily achieve this using the apply method. The apply method is somewhat like Python’s native map method, which can perform a mapping calculation on each element in the DataFrame. We only need to pass in the method we want to apply to the DataFrame in the apply method, which means that the parameter it accepts is a function, which is a very typical functional programming application.
For example, if we want to square the DataFrame, we can also pass the np.square function as a parameter.
In addition to the apply method being used on an entire DataFrame, we can also apply it to a certain row or a certain column Or on a certain part, the application method is the same. For example, we can apply the square method to a certain row and a certain column in the DataFrame.
In addition, the scope of the function in apply is not limited to elements. We can also write functions that act on a row or a column. For example, if we want to calculate the maximum value of each column in the DataFrame, we can write like this:
x in this anonymous function is actually a Series , then the max here is the max method that comes with Series. In other words, the scope of apply is Series. Although the final effect is that every element is changed, the scope of apply is not the element but Series. We apply row or column operations, which apply changes to each element.
Similarly, we can also limit the application object of apply to rows. Similarly, we need to pass in axis to limit it. We can pass in axis='columns' or specify axis=1. The two The effect is the same.
In addition, the result returned by apply does not have to be a scalar, it can also be a list or Series composed of multiple values. In fact, the two are the same, because Even if a List is returned, it will be converted into a Series.
Finally, let’s introduce applymap, which is an element-level map that we can use to operate each element in the DataFrame. For example, we can use it to convert the format of data in DataFrame.
The reason why we call it applymap instead of map is because there is already a map in the Series method, so applymap was created to distinguish it.
It should be noted here that if you change applymap in the above code to apply, an error will be reported. The reason for the error is also very simple, because the scope of the apply method is not the element but the Series, and the Series does not support such an operation.
In today’s article we mainly introduce How to use apply and applymap in pandas. These two methods are very commonly used in our daily operation of DataFrame data. They can be said to be scalpel-level APIs. Being proficient in it is very helpful for us to handle data processing. If you understand the application of the native map method in Python, I believe you will be able to understand today's article smoothly.
If you want to learn more about programming, please pay attention to the php training column!
The above is the detailed content of Pandas skills: Detailed explanation of apply and applymap methods in DataFrame. For more information, please follow other related articles on the PHP Chinese website!