##Worth-seeing Python efficient data processing
Pandas is a very commonly used data processing tool in Python and is very convenient to use. It is built on the NumPy array structure, so many of its operations are written through the extension modules that come with NumPy or Pandas. These modules are written in Cython and compiled into C, and are executed on C, thus ensuring the processing speed. Today we will experience its power.mu1, sigma1 = 0, 0.1 mu2, sigma2 = 0.2, 0.2 n = 1000df = pd.DataFrame( { "a1": pd.np.random.normal(mu1, sigma1, n), "a2": pd.np.random.normal(mu2, sigma2, n), "a3": pd.np.random.randint(0, 5, n), "y1": pd.np.logspace(0, 1, num=n), "y2": pd.np.random.randint(0, 2, n), } )
1. Draw the average line
2. Mark the important points
import matplotlib.pyplot as plt ax = df.y1.plot() ax.axhline(6, color="red", linestyle="--") ax.axvline(775, color="red", linestyle="--") plt.show()
fig, ax = plt.subplots(2, 2, figsize=(14,7)) df.plot(x="index", y="y1", ax=ax[0, 0]) df.plot.scatter(x="index", y="y2", ax=ax[0, 1]) df.plot.scatter(x="index", y="a3", ax=ax[1, 0]) df.plot(x="index", y="a1", ax=ax[1, 1]) plt.show()
df[["a1", "a2"]].plot(bins=30, kind="hist") plt.show()
df[["a1", "a2"]].plot(bins=30, kind="hist", subplots=True) plt.show()
df[['a1', 'a2']].plot(by=df.y2, subplots=True) plt.show()
df['ones'] = pd.np.ones(len(df)) m, c = pd.np.linalg.lstsq(df[['index', 'ones']], df['y1'], rcond=None)[0]
df['y'] = df['index'].apply(lambda x: x * m + c) df[['y', 'y1']].plot() plt.show()
The above is the detailed content of Efficient data processing in Python worth a look. For more information, please follow other related articles on the PHP Chinese website!