An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data.
R has a built-in functionality that makes it easy to read and write a xlsx file.
Sample xlsx File
To demonstrate how we read xlsx files in R, let's suppose we have an excel file named studentinfo.xlsx with following data:
We will be reading these datas with the help of R's built-in functions.
Install and Load xlsx Package
In order to read, write, and format Excel files into R, we first need to install and load the xlsx package as:
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
Here, we have successfully installed and loaded the xlsx package.
Now, we are able to read data from an xlsx file.
Read a xlsx File in R
In R, we use the read.xlsx() function to read a xlsx file available in our current directory. For example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# read studentinfo.xlsx file from our current directory
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
# display xlsx file
print(read_data)
Output
Name Age Faculty State
1 Abby 24 Business Florida
2 Hazzle 23 Engineering Arizona
3 Cathy 20 Engineering Colorado
4 Paterson 22 Arts Texas
5 Sammy 20 Economics Ohio
6 Pam 2 Arts Arizona
In the above example, we have read the studentinfo.xlsx file that is available in our current directory. Notice the code,
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
Here,
read.xlsx()- reads the xlsx filestudentinfo.xlsxand creates a dataframe which is stored in the read_data variable.sheetIndex = 1- reads specified worksheet i.e. 1
Note:
- If the file is in some other location, we have to specify the path along with the file name as:
read.xlsx("D:/folder1/studentinfo.xlsx", sheetIndex = 1). - We can also use the
read.xlsx2()function if the dataset we are working on is larger.
xlsx rowIndex and colIndex Argument in R
In R, we can also read a specific range of data from excel files. We can pass the rowIndex and colIndex argument inside read.xlsx() to read specific range.
rowIndex- reads a specific range of rowscolIndex- read a specific range of columns
Example: Read Range of Rows
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# read first five rows of xlsx file
read_data <- read.xlsx("studentinfo.xlsx",
sheetIndex = 1,
rowIndex = 1:5
)
# display xlsx file
print(read_data)
Output
Name Age Faculty State
1 Abby 24 Business Florida
2 Hazzle 23 Engineering Arizona
3 Cathy 20 Engineering Colorado
4 Paterson 22 Arts Texas
In the above example, we have passed rowIndex = 1:5 inside read.xlsx() so the function reads only the first five rows from the studentinfo.xlsx file.
Example: Read Range of Columns
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# read first three columns of xlsx file
read_data <- read.xlsx("studentinfo.xlsx",
sheetIndex = 1,
colIndex = 1:3
)
# display xlsx file
print(read_data)
Output
Name Age Faculty
1 Abby 24 Business
2 Hazzle 23 Engineering
3 Cathy 20 Engineering
4 Paterson 22 Arts
5 Sammy 20 Economics
6 Pam 21 Arts
Here, colIndex = 1:3 inside read.xlsx() reads only the first three columns from the studentinfo.xlsx file.
xlsx startRow Argument in R
Sometimes the excel file may contain headers at the beginning which we may not want to include. For example,
Here, the 1st Row of excel file contains a header, and the 2nd row is empty. So we don't want to include these two rows.
To start reading data from a specific row in excel worksheet, we pass the startRow argument inside read.xlsx().
Let's take a look at example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# start reading from 3rd row
read_data <- read.xlsx("studentinfo.xlsx",
sheetIndex = 1,
startRow = 3
)
# display xlsx file
print(read_data)
Output
Name Age Faculty State
1 Abby 24 Business Florida
2 Hazzle 23 Engineering Arizona
3 Cathy 20 Engineering Colorado
4 Paterson 22 Arts Texas
5 Sammy 20 Economics Ohio
6 Pam 21 Arts Arizona
In the above example, we have used the startRow argument inside the read.xlsx() function to start reading from the specified row.
startRow = 3 means the first two rows are ignored and read.xlsx() starts reading data from the 3rd row.
Write Into xlsx File in R
In R, we use the write.xlsx() function to write into a xlsx file. We pass the data in the form of dataframe. For example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE))
# write dataframe1 into file1 xlsx file
write.xlsx(dataframe1, "file1.xlsx")
In the above example, we have used the write.xlsx() function to export a data frame named dataframe1 to a xlsx file. Notice the arguments passed inside write.xlsx(),
write.xlsx(dataframe1, "file1.xlsx")
Here,
dataframe1- name of the data frame we want to exportfile1.xlsx- name of the xlsx file
Finally, the file1.xlsx file would look like this in our directory:
Rename Current Worksheet
We can rename the current worksheet by using the sheetName argument inside the write.xlsx() function. For example,
# install xlsx package
install.package("xlsx")
# load xlsx file
library("xlsx")
# create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE))
# name current worksheet
write.xlsx(dataframe1, "file1.xlsx",
sheetName = "Voting Eligibility"
)
Here, we have passed sheetname = "Voting Eligibility" inside write.xlsx(), so the name of the sheet is changed to "Voting Eligibility".
So the file1.xlsx looks like this: