lang:r:read.table

About

The Read.Table function reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.

Articles Related

Syntax

read.table(
     file, 
     header = FALSE, 
     sep = "", 
     quote = "\"'",
     dec = ".",
     row.names,
     col.names,
     as.is = !stringsAsFactors,
     na.strings = "NA", 
     colClasses = NA, 
     nrows = -1,
     skip = 0, 
     check.names = TRUE, 
     fill = !blank.lines.skip,
     strip.white = FALSE, 
     blank.lines.skip = TRUE,
     comment.char = "#",
     allowEscapes = FALSE,
     flush = FALSE,
     stringsAsFactors = default.stringsAsFactors(),
     fileEncoding = "",
     encoding = "unknown",
     text
     )

where:

file can be a file, an Url or a connection.
header indicate if the file has a header line
sep is a string indicating how the columns are separated
colClasses, a character vector indicating the class of each column in the dataset
nrows, the number of rows in the dataset
comment.char, a character string indicating the comment character
skip, the number of lines to skip from the beginning
stringsAsFactors, should character variables be coded as factors?

Performance

By default, Read.table will:

figure out: colclasses (what type of variable is in each column of the table)
check if each line is a comment: comment.char (comment.char = “” disable it)

By giving R all these parameters will make R run faster as it don't need to perform them.

Memory

The dataset must no be larger than the amount of your RAM.

1,000,000 rows, 10 columns with numeric data = 1,000,000 * 10 * 8 bytes = 76 Mb

Options

colClasses

colClasses = "numeric"

To figure out the classes of each column, you can use this snippets:

mySubsetDataTable = read.table("myFile.txt", nrows = 100)
classes = sapply(mySubsetDataTable, class)
myDataTable = read.table("myFile.txt", colClasses = classes)

nrows

See the Linux tool wc on how to calculate the number of lines in a file.

Setting nrows will help with memory usage.

Table of Contents

R - Read.Table

About

Articles Related

Syntax

Performance

Memory

Options

colClasses

nrows