In this article, we will learn how to implement a Python program to calculate the standard deviation of a data set.
Consider a set of values plotted on an arbitrary axis. The standard deviation of these sets of values is called the population and is defined as the variation between them. If the standard deviation is low, the plotted values will be closer to the mean. But if the standard deviation is higher, the values will be further away from the mean.
It is represented by the square root of the variance of the data set. There are two types of standard deviation -
The population standard deviation is calculated from each data value of the population. Therefore, it is a fixed value. The mathematical formula is defined as follows -
$$\mathrm{SD\:=\:\sqrt{\frac{\sum(X_i\:-\:X_m)^2}{n}}}$$
Where,
(Where)Xm is the mean of the data set.
Xi are elements of the dataset.
n is the number of elements in the dataset.
However, Sample standard deviation is a statistic calculated only for certain data values of a population, so its value depends on the sample chosen. The mathematical formula is defined as follows −
$$\mathrm{SD\:=\:\sqrt{\frac{\sum(X_i\:-\:X_m)^2}{n\:-\:1}}}$$
Where,
(Where)Xm is the mean of the data set.
Xi are elements of the dataset.
n is the number of elements in the dataset.
Now let’s look at some input and output scenarios for different data sets -
Assume that the data set contains only positive integers -
Input: [2, 3, 4, 1, 2, 5] Result: Population Standard Deviation: 1.3437096247164249 Sample Standard Deviation: 0.8975274678557505
Assume that the data set contains only negative integers -
Input: [-2, -3, -4, -1, -2, -5] Result: Population Standard Deviation: 1.3437096247164249 Sample Standard Deviation: 0.8975274678557505
Assume that the data set contains only positive and negative integers -
Input: [-2, -3, -4, 1, 2, 5] Result: Population Standard Deviation: 3.131382371342656 Sample Standard Deviation: 2.967415635794143
We have already seen the formula for standard deviation in the same article; now let us look at implementing the mathematical formula on various data sets using a Python program.
In the following example, we import the math library and calculate the standard deviation of a data set and its variance by applying the sqrt() built-in function .
import math #declare the dataset list dataset = [2, 3, 4, 1, 2, 5] #find the mean of dataset sm=0 for i in range(len(dataset)): sm+=dataset[i] mean = sm/len(dataset) #calculating population standard deviation of the dataset deviation_sum = 0 for i in range(len(dataset)): deviation_sum+=(dataset[i]- mean)**2 psd = math.sqrt((deviation_sum)/len(dataset)) #calculating sample standard deviation of the dataset ssd = math.sqrt((deviation_sum)/len(dataset) - 1) #display output print("Population standard deviation of the dataset is", psd) print("Sample standard deviation of the dataset is", ssd)
The obtained output standard deviation is as follows -
Population Standard Deviation of the dataset is 1.3437096247164249 Sample standard deviation of the dataset is 0.8975274678557505
In this approach, we import the numpy module and calculate the overall standard of the elements of a numpy array using only the numpy.std() function Difference.
Implement the following python program to calculate the standard deviation of numpy array elements -
import numpy as np #declare the dataset list dataset = np.array([2, 3, 4, 1, 2, 5]) #calculating standard deviation of the dataset sd = np.std(dataset) #display output print("Population standard deviation of the dataset is", sd)
The standard deviation is displayed as the following output -
Population Standard Deviation of the dataset is 1.3437096247164249
The Statistics module in Python provides functions named stdev() and pstdev() to calculate the standard deviation of a sample data set. The stdev() function in Python only calculates the sample standard deviation, while the pstdev() function calculates the population standard deviation.
The parameters and return types of the two functions are the same.
The Python program that demonstrates the use of the stdev() function to calculate the sample standard deviation of a data set is as follows −
import statistics as st #declare the dataset list dataset = [2, 3, 4, 1, 2, 5] #calculating standard deviation of the dataset sd = st.stdev(dataset) #display output print("Standard Deviation of the dataset is", sd)
The sample standard deviation of the data set obtained as output is as follows -
Standard Deviation of the dataset is 1.4719601443879744
The python program that demonstrates how to use the pstdev() function to find the overall standard deviation of a data set is as follows -
import statistics as st #declare the dataset list dataset = [2, 3, 4, 1, 2, 5] #calculating standard deviation of the dataset sd = st.pstdev(dataset) #display output print("Standard Deviation of the dataset is", sd)
The sample standard deviation of the data set obtained as output is as follows -
Standard Deviation of the dataset is 1.3437096247164249
The above is the detailed content of Python program to calculate standard deviation. For more information, please follow other related articles on the PHP Chinese website!