Pandas Array

An array allows us to store a collection of multiple values in a single data structure.

Pandas array is designed to provide a more memory-efficient and performance-enhanced alternative to Python's built-in lists, NumPy arrays, and other data structures for handling the same type of data.


Create Array Using Python List

We can create a Pandas array using a Python List. For example,

import pandas as pd

# create a list named data
data = [2, 4, 6, 8]

# create Pandas array using data
array1 = pd.array(data)

print(array1)

Output

<IntegerArray>
[2, 4, 6, 8]
Length: 4, dtype: Int64

In the above example, we first imported the pandas library as pd and created a list named data. Notice the code

array1 = pd.array(data)

Here, we have created an array by passing data as an argument to the pd.array() function.

Instead of creating a list and using the list variable with the pd.array() function, we can directly pass list elements as an argument. For example,

import pandas as pd

# create Pandas array by passing list directly
array1 = pd.array([2, 4, 6, 8])

print(array1)

Output

<IntegerArray>
[2, 4, 6, 8]
Length: 4, dtype: Int64

This code gives the same output as the previous code.


Explicitly Specify Array Elements Data Type

In Pandas, we can explicitly specify the data type of array elements. For example,

import pandas as pd

# creating a pandas.array of integers
int_array = pd.array([1, 2, 3, 4, 5], dtype='int')
print(int_array)
print()

# creating a pandas.array of floating-point numbers
float_array = pd.array([1.1, 2.2, 3.3, 4.4, 5.5], dtype='float')
print(float_array)
print()

# creating a pandas.array of strings
string_array = pd.array(['apple', 'banana', 'cherry', 'date'], dtype='str')
print(string_array)
print()

# creating a pandas.array of boolean values
bool_array = pd.array([True, False, True, False], dtype='bool')
print(bool_array)
print()

Output

<NumpyExtensionArray>
[1, 2, 3, 4, 5]
Length: 5, dtype: int64

<NumpyExtensionArray>
[1.1, 2.2, 3.3, 4.4, 5.5]
Length: 5, dtype: float64

<NumpyExtensionArray>
['apple', 'banana', 'cherry', 'date']
Length: 4, dtype: str192

<NumpyExtensionArray>
[True, False, True, False]
Length: 4, dtype: bool

In the above example, we have passed the dtype argument inside pd.array() to explicitly specify the data type of the array elements.

Here,

  • int_array - creates an array containing integers by specifying dtype = 'int'
  • float_array - creates an array containing floating-point numbers by specifying dtype = 'float'
  • string_array - creates an array containing strings by specifying dtype = 'str'
  • bool_array - creates an array containing boolean values (True or False) by specifying dtype = 'bool'

Create Series From Pandas Array

In Pandas, we can directly create Pandas Series from Pandas Array.

For that we use the Series() method. Let's look at an example.

import pandas as pd

# create a Pandas array
arr = pd.array([18, 20, 19, 21, 22])

# create a Pandas series from the Pandas array
arr_series = pd.Series(arr)

print(arr_series)

Output

0    18
1    20
2    19
3    21
4    22
dtype: Int64

Here, we have used pd.Series(arr) to create a Series from Pandas array named arr.

In the output,

  1. The left column represents the index of the Series. The default index is a sequence of integers starting from 0.
  2. The right column represents the values of the Series, which correspond to the values of the Pandas array arr.