R - Subset Operators (Extract or Replace Parts of an Object)

About

In the Data Manipulation category, you have the subset operators:

[
[[
$

This operators acts on:

to extract or replace parts.

Articles Related

Usage

# Extraction
x[columnsSelector]
x[rowsSelector, columnsSelector, drop = ]
x[[rowsSelector, columnsSelector]] # shortcut can be used only to select one element (column, row) with drop = true 
x$name

# Replace
x[rowsSelector, columnsSelector] <- value

where:

x is an object (data frame, …)
rowsSelector: (optional) rows selector by index or name to extract or replace. Default all rows.
columnsSelector: columns selector by index or name to extract or replace.

2:3 # column 2 to 3
c("colA", "colC")

drop: (optional) logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left (the column becomes a list), but not to drop if only one row is left (the row stays in a data frame).
$ is used to extract elements of a list or data frame by name.
[ always returns an object of the same class as the original. It can be used to select more than one element (there is one exception)
[[ is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame

Demo Data

We will subset the following data frame

> df = data.frame(colA=1:5,colB=2:6,colC=3:7,row.names=letters[1:5])
> df

colA colB colC
a    1    2    3
b    2    3    4
c    3    4    5
d    4    5    6
e    5    6    7

Example for a list are available here

Subset Type

Column

One Column

Extract the column by name:

df$colA

[1] 1 2 3 4 5

Retrieve the column by index number and return a vector. The second:

df[,2]  # return an vector integer because the default is to have drop = true for a column (not for a row)
df[,2, drop = TRUE] # same

[1] 2 3 4 5 6

Retrieve the column by index number and return a data frame. The second:

df[2]
df[,2,drop = FALSE] # same

colB
a    2
b    3
c    4
d    5
e    6

Retrieve the column by naming index.

df[,"colB"]

[1] 2 3 4 5 6

Remove the second column

df[-2]

colA colC
a    1    3
b    2    4
c    3    5
d    4    6
e    5    7

Multiple Columns

Retrieving columns by indexing

# By naming
df[,c("colA","colB")]
# or
df[c("colA","colB")]
# By Indexing
df[,c(1,2)]
# or
df[c(1,2)]

colA colB
a    1    2
b    2    3
c    3    4
d    4    5
e    5    6

Retrieve columns by range

df[2:3]
# or
df[,2:3]

colB colC
a    2    3
b    3    4
c    4    5
d    5    6
e    6    7

Removing Multiple Columns

df[-c(1,2)]

colC
a    3
b    4
c    5
d    6
e    7

Row

One Row

Retrieve the row by index and dropping the dimension. The fourth

df[4,]

colA colB colC
d    4    5    6

Retrieve the row by index without dropping the dimension. The fourth

df[4,,drop=FALSE]

colA colB colC
d    4    5    6

Retrieve the row by index naming

df["d",]

colA colB colC
d    4    5    6

Multiple Rows Indexing

Retrieve two rows by naming

df[c("b","e"),]

colA colB colC
b    2    3    4
e    5    6    7

Retrieve the first and third rows by logical vector

df[c(TRUE,FALSE,TRUE,FALSE,FALSE),]

colA colB colC
a    1    2    3
c    3    4    5

Multiple Rows Filtering

Retrieve the rows by logical vector where colB > 3

df[df$colB>3,]
# of
df[df[,"colB"]>3,]
# of
df[df[,2]>3,]

colA colB colC
c    3    4    5
d    4    5    6
e    5    6    7

Retrieve the rows by logical vector where colB > 3 and colC⇐6

df[df$colB>3&df$colC<=6,]

colA colB colC
c    3    4    5
d    4    5    6

Vertical and Horizontal

Extracting the intersection of rows and columns

df[2:4,2:3]

colB colC
b    3    4
c    4    5
d    5    6

Update the total time column where the succes flag is 3 and the error text contains 46006

res[res$SUCCESS_FLG==3 & grepl("46066",res$ERROR_TEXT),c("TOTAL_TIME_SEC")] <-200

Documentation / Reference

?":"
?"["
?"$"
?"[["
?"[.data.frame"