Pandas read_csv()

The read_csv() function in Pandas is used to convert a CSV file into a DataFrame.

Example

Let's suppose that sample_data.csv contains the following content:

Employee ID,First Name,Last Name,Department,Position,Salary
101,John,Doe,Marketing,Manager,50000
102,Jane,Smith,Sales,Associate,35000
103,Michael,Johnson,Finance,Analyst,45000
104,Emily,Williams,HR,Coordinator,40000

Now, let's write code to read the above csv file using read_csv().

import pandas as pd

# load data from a CSV file
df = pd.read_csv('sample_data.csv')

print(df)
'''
Output

   Employee ID First Name Last Name Department     Position  Salary
0          101       John       Doe  Marketing      Manager   50000
1          102       Jane     Smith      Sales    Associate   35000
2          103    Michael   Johnson    Finance      Analyst   45000
3          104      Emily  Williams         HR  Coordinator   40000
'''

read_csv() Syntax

The syntax for the read_csv() function in Pandas is:

pd.read_csv(filepath_or_buffer, sep=',', header='infer', names=None, index_col=None, usecols=None, dtype=None, skiprows=None, nrows=None, na_values=None, parse_dates=False)

read_csv() Arguments

The read_csv() function takes the following common arguments:

  1. filepath_or_buffer: the path to the file or a file-like object
  2. sep or delimiter (optional): the delimiter to use
  3. header (optional): row number to use as column names
  4. names (optional): list of column names to use.
  5. index_col (optional): column(s) to set as index
  6. usecols (optional): return a subset of the columns
  7. dtype (optional): type for data or column(s)
  8. nrows (optional): number of rows of file to read
  9. na_values (optional): additional strings to recognize as NaN
  10. parse_dates (optional): boolean or list of integers or names or list of lists or dictionaries

read_csv() Return Value

The read_csv() function returns a DataFrame containing the data read from the CSV file.


Example 1: Basic CSV Reading

Let's suppose that sample_data.csv contains the following content:

Employee ID,First Name,Last Name,Department,Position,Salary
101,John,Doe,Marketing,Manager,50000
102,Jane,Smith,Sales,Associate,35000
103,Michael,Johnson,Finance,Analyst,45000
104,Emily,Williams,HR,Coordinator,40000

Now, let's write code to read the above csv file using read_csv().

import pandas as pd

# load data from a CSV file
df = pd.read_csv('sample_data.csv')

print(df)

Output

   Employee ID First Name Last Name Department     Position  Salary
0          101       John       Doe  Marketing      Manager   50000
1          102       Jane     Smith      Sales    Associate   35000
2          103    Michael   Johnson    Finance      Analyst   45000
3          104      Emily  Williams         HR  Coordinator   40000

In this example, we read data from sample_data.csv and print the DataFrame.


Example 2: Skipping Rows and Setting Index Column

For this example, let's use the same csv file used in the first example (with comma as delimiter) .

import pandas as pd

# skip the first row and set the first column as the index
df = pd.read_csv('sample_data.csv', skiprows=1, index_col=0)

print(df)

Output


101     John      Doe     Marketing      Manager  50000
102     Jane     Smith     Sales    Associate  35000
103  Michael   Johnson   Finance      Analyst  45000
104    Emily  Williams        HR  Coordinator  40000

Here, we skipped the first row, so the second row is automatically inferred to be the header. Also, we used the first column to be the index using index_col=0.


Example 3: Reading Selected Columns with Data Types

For this example, let's use the same file sample_data.csv.

import pandas as pd

# read specific columns and set their data types
df = pd.read_csv('sample_data.csv', usecols=['First Name', 'Salary'], dtype={'First Name': str, 'Salary': float})

print(df)

Output

  First Name   Salary
0       John  50000.0
1       Jane  35000.0
2    Michael  45000.0
3      Emily  40000.0

This example reads only the First Name and Salary columns from the file and sets the data type for each column.

Note: When working with large CSV files, you might want to consider parameters such as chunksize for reading the file in chunks, or an iterator to read the file piece by piece.


Example 4: Specifying Delimiter and Column Names

For this example, let's suppose that sample_data.csv has the following content:

Employee ID;First Name;Last Name;Department;Position;Salary
101;John;Doe;Marketing;Manager;50000
102;Jane;Smith;Sales;Associate;35000
103;Michael;Johnson;Finance;Analyst;45000
104;Emily;Williams;HR;Coordinator;40000

Notice the use of ; as the delimiter. Now, let's read the CSV file separated by a delimiter.

import pandas as pd

# specify a delimiter and column names
df = pd.read_csv('sample_data.csv', delimiter=';', names=['ID', 'Name', 'Surname', 'Dept', 'Position', 'Salary'], header=0)

print(df)

Output

    ID     Name   Surname       Dept     Position  Salary
0  101     John       Doe  Marketing      Manager   50000
1  102     Jane     Smith      Sales    Associate   35000
2  103  Michael   Johnson    Finance      Analyst   45000
3  104    Emily  Williams         HR  Coordinator   40000

In this example, we specified the delimiter to be ;. We also specified the column names manually using the names argument.

Here, the header=0 argument indicates that row 0 is the header.