NumPy Statistical Functions

Statistics involves gathering data, analyzing it, and drawing conclusions based on the information collected.

NumPy provides us with various statistical functions that can perform statistical data analysis.


Common NumPy Statistical Functions

Here are some of the statistical functions provided by NumPy:

Functions Descriptions
median() return the median of an array
mean() return the mean of an array
std() return the standard deviation of an array
percentile() return the nth percentile of elements in an array
min() return the minimum element of an array
max() return the maximum element of an array

Next, we will see examples using these functions.


Find Median Using NumPy

The median value of a numpy array is the middle value in a sorted array.

In other words, it is the value that separates the higher half from the lower half of the data.

Suppose we have the following list of numbers:

1, 5, 7, 8, 9, 12, 14
 

Then, median is simply the middle number, which in this case is 8.

It is important to note that if the number of elements is

  • Odd, the median is the middle element.
  • Even, the median is the average of the two middle elements.

Now, we will learn how to calculate the median using NumPy for arrays with odd and even number of elements.


Example 1: Compute Median for Odd Number of Elements

import numpy as np

# create a 1D array with 5 elements
array1 = np.array([1, 2, 3, 4, 5])
                                                                                                           
# calculate the median
median = np.median(array1)

print(median) 

# Output: 3.0

In the above example, the array named array1 contains an odd number of elements (5 elements).

So, np.median(array1) returns the median of array1 as 3, which is the middle value of the sorted array.


Example 2: Compute Median for Even Number of Elements

import numpy as np

# create a 1D array with 6 elements
array1 = np.array([1, 2, 3, 4, 5, 7])

# calculate the median
median = np.median(array1)
print(median) 

# Output: 3.5

Here, since the array1 array has an even number of elements (6 elements), the median is calculated as the average of the two middle elements (3 and 4) i.e. 3.5.


Median of NumPy 2D Array

Calculation of the median is not just limited to 1D array. We can also calculate the median of the 2D array.

In a 2D array, median can be calculated either along the horizontal or the vertical axis individually, or across the entire array.

When computing the median of a 2D array, we use the axis parameter inside np.median() to specify the axis along which to compute the median.

If we specify,

  • axis = 0, median is calculated along vertical axis
  • axis = 1, median is calculated along horizontal axis

If we don't use the axis parameter, the median is computed over the entire array.


Example: Compute the median of a 2D array

import numpy as np

# create a 2D array
array1 = np.array([[2, 4, 6], 
                   [8, 10, 12], 
                   [14, 16, 18]])

# compute median along horizontal axis 
result1 = np.median(array1, axis=1)

print("Median along horizontal axis :", result1)

# compute median along vertical axis
result2 = np.median(array1, axis=0)

print("Median along vertical axis:", result2)

# compute median of entire array
result3 = np.median(array1)

print("Median of entire array:", result3)

Output

Median along horizontal axis : [ 4. 10. 16.]
Median along vertical axis: [ 8. 10. 12.]
Median of entire array: 10.0

In this example, we have created a 2D array named array1.

We then computed the median along the horizontal and vertical axis individually and then computed the median of the entire array.

  • np.median(array1, axis=1) - median along horizontal axis, which gives [4. 10. 16.]
  • np.median(array1, axis=0) - median along vertical axis, which gives [8. 10. 12.]
  • np.median(array1) - median over the entire array, which gives 10.0

To calculate the median over the entire 2D array, first we flatten the array to [ 2, 4, 6, 8, 10, 12, 14, 16, 18] and then find the middle value of the flattened array which in our case is 10.


Compute Mean Using NumPy

The mean value of a NumPy array is the average value of all the elements in the array.

It is calculated by adding all elements in the array and then dividing the result by the total number of elements in the array.

We use the np.mean() function to calculate the mean value. For example,

import numpy as np

# create a numpy array
marks = np.array([76, 78, 81, 66, 85])

# compute the mean of marks
mean_marks = np.mean(marks)

print(mean_marks)

# Output: 77.2

In this example, the mean value is 77.2, which is calculated by adding the elements (76, 78, 81, 66, 85) and dividing the result by 5 (total number of array elements).


Example 3: Mean of NumPy N-d Array

import numpy as np

# create a 2D array
array1 = np.array([[1, 3], 
                 [5, 7]])

# calculate the mean of the entire array
result1 = np.mean(array1)
print("Entire Array:",result1)  # 4.0

# calculate the mean along vertical axis (axis=0)
result2 = np.mean(array1, axis=0)
print("Along Vertical Axis:",result2)  # [3. 5.]

# calculate the mean along  (axis=1)
result3 = np.mean(array1, axis=1)
print("Along Horizontal Axis :",result3)  # [2. 6.]

Output

Entire Array: 4.0
Along Vertical Axis: [3. 5.]
Along Horizontal Axis : [2. 6.]

Here, first we have created the 2D array named array1. We then calculated the mean using np.mean().

  • np.mean(array1) - calculates the mean over the entire array
  • np.mean(array1, axis=0) - calculates the mean along vertical axis
  • np.mean(array1, axis=1) calculates the mean along horizontal axis

Standard Deviation of NumPy Array

The standard deviation is a measure of the spread of the data in the array. It gives us the degree to which the data points in an array deviate from the mean.

  • Smaller standard deviation indicates that the data points are closer to the mean
  • Larger standard deviation indicates that the data points are more spread out.

In NumPy, we use the np.std() function to calculate the standard deviation of an array.


Example: Compute the Standard Deviation in NumPy

import numpy as np

# create a numpy array
marks = np.array([76, 78, 81, 66, 85])

# compute the standard deviation of marks
std_marks = np.std(marks)
print(std_marks)

# Output: 6.803568381206575

In the above example, we have used the np.std() function to calculate the standard deviation of the marks array.

Here, 6.803568381206575 is the standard deviation of marks. It tells us how much the values in the marks array deviate from the mean value of the array.


Standard Deviation of NumPy 2D Array

In a 2D array, standard deviation can be calculated either along the horizontal or the vertical axis individually, or across the entire array.

Similar to mean and median, when computing the standard deviation of a 2D array, we use the axis parameter inside np.std() to specify the axis along which to compute the standard deviation.


Example: Compute the Standard Deviation of a 2D array.

import numpy as np

# create a 2D array
array1 = np.array([[2, 5, 9], 
                 [3, 8, 11], 
                 [4, 6, 7]])

# compute standard deviation along horizontal axis
result1 = np.std(array1, axis=1)
print("Standard deviation along horizontal axis:", result1)

# compute standard deviation along vertical axis
result2 = np.std(array1, axis=0)
print("Standard deviation  along vertical axis:", result2)

# compute standard deviation of entire array
result3 = np.std(array1)
print("Standard deviation of entire array:", result3)

Output

Standard deviation along horizontal axis: [2.86744176 3.29983165 1.24721913]
Standard deviation along vertical axis: [0.81649658 1.24721913 1.63299316]
Standard deviation of entire array: 2.7666443551086073

Here, we have created a 2D array named array1.

We then computed the standard deviation along horizontal and vertical axis individually and then computed the standard deviation of the entire array.


Compute Percentile of NumPy Array

In NumPy, we use the percentile() function to compute the nth percentile of a given array.

Let's see an example.

import numpy as np

# create an array
array1 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19])

# compute the 25th percentile of the array
result1 = np.percentile(array1, 25)
print("25th percentile:",result1)

# compute the 75th percentile of the array
result2 = np.percentile(array1, 75)
print("75th percentile:",result2)

Output

25th percentile: 5.5
75th percentile: 14.5

Here,

  • 25% of the values in array1 are less than or equal to 5.5.
  • 75% of the values in array1 are less than or equal to 14.5.

Note: To learn more about percentile, visit NumPy Percentile.


Find Minimum and Maximum Value of NumPy Array

We use the min() and max() function in NumPy to find the minimum and maximum values in a given array.

Let's see an example.

import numpy as np

# create an array
array1 = np.array([2,6,9,15,17,22,65,1,62])

# find the minimum value of the array
min_val = np.min(array1)

# find the maximum value of the array
max_val = np.max(array1)

# print the results
print("Minimum value:", min_val)
print("Maximum value:", max_val)

Output

Minimum value: 1
Maximum value: 65

As we can see min() and max() returns the minimum and maximum value of array1 which is 1 and 65 respectively.

Note: To learn more about min() and max(), visit NumPy min() and NumPy max().