Pandas: Extracting Numbers from Strings
When working with data frames in Pandas, it's often necessary to extract numeric information from cells that contain non-numeric characters. This can be challenging, but Pandas provides several methods to help you achieve this.
Using str.extract() for Number Extraction
One effective method for extracting numbers from strings is str.extract(). This method allows you to specify a regular expression pattern that defines the numeric data you want to capture.
Consider the following data frame:
<code class="python">import pandas as pd import numpy as np df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'], }) print(df)</code>
Output:
A 0 1a 1 NaN 2 10a 3 100b 4 0b
To extract the numbers from each cell, you can use the following regular expression:
<code class="python">df.A.str.extract('(\d+)')</code>
The regex pattern (d ) captures any sequence of one or more digits. The parentheses around the pattern create a capturing group, which is used to return the matched portion of the string.
Output:
0 1 1 NaN 2 10 3 100 4 0 Name: A, dtype: object
As you can see, the desired numbers have been successfully extracted from each cell, even those that contained non-numeric characters. Note that this method will only work for whole numbers and not for floating-point numbers.
The above is the detailed content of How to Extract Numbers from Non-Numeric Strings in Pandas?. For more information, please follow other related articles on the PHP Chinese website!