Pandas DataFrame

A DataFrame is like a table where the data is organized in rows and columns. It is a two-dimensional data structure like a two-dimensional array. For example,

     Country      Capital      Population
0    Canada       Ottawa       37742154
1    Australia    Canberra     25499884
2    UK           London       67886011
3    Brazil       Brasília     212559417

Here,

  • Country, Capital and Population are the column names.
  • Each row represents a record, with the index value on the left. The index values are auto-assigned starting from 0.
  • Each column contains data of the same type. For instance, Country and Capital contain strings, and Population contains integers.

The DataFrame is similar to a table in a SQL database, or a spreadsheet in Excel. It is designed to manage ordered and unordered datasets in Python.


Create a Pandas DataFrame

We can create a Pandas DataFrame in the following ways:

  • Using Python Dictionary
  • Using Python List
  • From a File
  • Creating an Empty DataFrame

Pandas DataFrame Using Python Dictionary

We can create a dataframe using a dictionary by passing it to the DataFrame() function. For example,

import pandas as pd

# create a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
       'Age': [25, 30, 35],
       'City': ['New York', 'London', 'Paris']}

# create a dataframe from the dictionary df = pd.DataFrame(data)
print(df)

Output

   Name  Age      City
0  John   25  New York
1  Alice  30  London
2  Bob    35  Paris

In this example, we created a dictionary called data that contains the column names (Name, Age, City) as keys, and lists of values as their respective values.

We then used the pd.DataFrame() function to convert the dictionary into a DataFrame called df.


Pandas DataFrame Using Python List

We can also create a DataFrame using a two-dimensional list. For example,

import pandas as pd

# create a two-dimensional list
data = [['John', 25, 'New York'],
       ['Alice', 30, 'London'],
       ['Bob', 35, 'Paris']]

# create a DataFrame from the list df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

Output

   Name  Age      City
0  John   25  New York
1  Alice  30  London
2  Bob    35  Paris

In this example, we created a two-dimensional list called data containing nested lists.

The DataFrame() function converts the 2-D list to a DataFrame. Each nested list behaves like a row of data in the DataFrame.

The columns argument provides a name to each column of the DataFrame.

Note: We can also create a DataFrame using NumPy array in a similar way.


Pandas DataFrame From a File

Another common way to create a DataFrame is by loading data from a CSV (Comma-Separated Values) file. For example,

import pandas as pd

# load data from a CSV file
df = pd.read_csv('data.csv')

print(df)

In this example, we used the read_csv() function which reads the CSV file data.csv, and automatically creates a DataFrame object df, containing data from the CSV file.

Note: We can also create a DataFrame using other file types like JSON, Excel spreadsheet, SQL database, etc. The methods to read different file types are listed below:

  • JSON - read_json()
  • Excel spreadsheet - read_excel()
  • SQL - read_sql()

Create an Empty DataFrame

Sometimes we may want to create an empty DataFrame and then add data later. For example,

import pandas as pd

# create an empty DataFrame df = pd.DataFrame()
print(df)

Output

Empty DataFrame
Columns: []
Index: []

In this example, we have created an empty DataFrame by calling pd.DataFrame() without any arguments.

Here, both the Columns and Index lists are empty in the DataFrame.The DataFrame has no data, but it can be used as a container to store and manipulate data later.