NumPy Vectorization

NumPy vectorization involves performing mathematical operations on entire arrays, eliminating the need to loop through individual elements.

We will see an overview of NumPy vectorization and demonstrate its advantages through examples.


NumPy Vectorization

We've used the concept of vectorization many times in NumPy. It refers to performing element-wise operations on arrays.

Let's take a simple example. When we add a number with a NumPy array, it adds up with each element of the array.

import numpy as np

array1 = np.array([1, 2, 3, 4, 5 ])
number = 10

#  number sums up with each array element
result = array1 + number

print(result)

Output

[11 12 13 14 15]

Here, the number 10 adds up with each array element. This is possible because of vectorization.

Without vectorization, performing the operation would require the use of loops.


Example: Numpy Vectorization to Add Two Arrays Together

import numpy as np

# define two 2D arrays
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[0, 1, 2], [0, 1, 2]])

# add two arrays (vectorization)
array_sum = array1 + array2

print("Sum between two arrays:\n", array_sum)

Output

Sum between two arrays:
[[1 3 5]
 [4 6 8]]

In this example, we have created two 2D arrays array1 and array2, and added them together.

This is a vectorized operation, where corresponding elements of two arrays are added together element-wise.


NumPy Vectorization Vs Python for Loop

Even though NumPy is a Python library, it inherited vectorization from C programming. As C is efficient in terms of speed and memory, NumPy vectorization is also much faster than Python.

Let's compare the time it takes to perform a vectorized operation with that of an equivalent loop-based operation.

Python for loop

import time

start = time.time()

array1 = [1, 2, 3, 4, 5]

for i in range(len(array1)):
    array1[i] += 10

end = time.time()

print("For loop time:", end - start)

Output

For loop time: 4.76837158203125e-06

NumPy Vectorization

import numpy as np
import time

start = time.time()

array1 = np.array([1, 2, 3, 4, 5 ])

result = array1 + 10

end = time.time()

print("Vectorization time:", end - start)

Output

Vectorization time: 1.5020370483398438e-05

Here, the difference in execution time between vectorization and a for loop is significant, even for simple operation.

This comparison illustrates the performance benefits of vectorization, especially when working with large datasets.


NumPy Vectorize() Function

In NumPy, every mathematical operation with arrays is automatically vectorized. So we don't always need to use the vectorize() function.

Let's take a scenario. You have an array and a function that returns the square of a positive number.

import numpy as np

# array
array1 = np.array([-1, 0, 2, 3, 4])

# function that returns the square of a positive number
def find_square(x):
    if x < 0:
        return 0
    else:
        return x ** 2

Now, to apply the function find_square() to array1, we have two options: use a loop or vectorize the operation.

Since loops are complicated and slow by nature, it's efficient and convenient to use vectorize().

Let's see an example.

import numpy as np

# array whose square we need to find
array1 = np.array([-1, 0, 2, 3, 4])

# function to find the square
def find_square(x):
    if x < 0:
        return 0
    else:
        return x ** 2
        
# vectorize() to vectorize the function find_square()
vectorized_function = np.vectorize(find_square)

# passing an array to a vectorized function
result = vectorized_function(array1)

print(result)

Output

[ 0  0  4  9 16]

In this example, we used the vectorize() function to vectorize the find_square() function. We then passed array1 as a parameter to the vectorized function.