Table of Contents

R - Read.Table

About

The Read.Table function reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.

Syntax

read.table(
     file, 
     header = FALSE, 
     sep = "", 
     quote = "\"'",
     dec = ".",
     row.names,
     col.names,
     as.is = !stringsAsFactors,
     na.strings = "NA", 
     colClasses = NA, 
     nrows = -1,
     skip = 0, 
     check.names = TRUE, 
     fill = !blank.lines.skip,
     strip.white = FALSE, 
     blank.lines.skip = TRUE,
     comment.char = "#",
     allowEscapes = FALSE,
     flush = FALSE,
     stringsAsFactors = default.stringsAsFactors(),
     fileEncoding = "",
     encoding = "unknown",
     text
     )

where:

Performance

By default, Read.table will:

By giving R all these parameters will make R run faster as it don't need to perform them.

Memory

The dataset must no be larger than the amount of your RAM.

1,000,000 rows, 10 columns with numeric data = 1,000,000 * 10 * 8 bytes = 76 Mb

Options

colClasses

colClasses = "numeric"

To figure out the classes of each column, you can use this snippets:

mySubsetDataTable = read.table("myFile.txt", nrows = 100)
classes = sapply(mySubsetDataTable, class)
myDataTable = read.table("myFile.txt", colClasses = classes)

nrows

See the Linux tool wc on how to calculate the number of lines in a file.

Setting nrows will help with memory usage.