About
In the Data Manipulation category, you have the subset operators:
- [
- [[
- $
This operators acts on:
to extract or replace parts.
See also: the subset function
Articles Related
Usage
# Extraction
x[columnsSelector]
x[rowsSelector, columnsSelector, drop = ]
x[[rowsSelector, columnsSelector]] # shortcut can be used only to select one element (column, row) with drop = true
x$name
# Replace
x[rowsSelector, columnsSelector] <- value
where:
- x is an object (data frame, …)
- rowsSelector: (optional) rows selector by index or name to extract or replace. Default all rows.
- columnsSelector: columns selector by index or name to extract or replace.
2:3 # column 2 to 3
c("colA", "colC")
- drop: (optional) logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left (the column becomes a list), but not to drop if only one row is left (the row stays in a data frame).
- [ always returns an object of the same class as the original. It can be used to select more than one element (there is one exception)
- [[ is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame
Demo Data
We will subset the following data frame
> df = data.frame(colA=1:5,colB=2:6,colC=3:7,row.names=letters[1:5])
> df
colA colB colC
a 1 2 3
b 2 3 4
c 3 4 5
d 4 5 6
e 5 6 7
Example for a list are available here
Subset Type
Column
One Column
- Extract the column by name:
df$colA
[1] 1 2 3 4 5
- Retrieve the column by index number and return a vector. The second:
df[,2] # return an vector integer because the default is to have drop = true for a column (not for a row)
df[,2, drop = TRUE] # same
[1] 2 3 4 5 6
- Retrieve the column by index number and return a data frame. The second:
df[2]
df[,2,drop = FALSE] # same
colB
a 2
b 3
c 4
d 5
e 6
- Retrieve the column by naming index.
df[,"colB"]
[1] 2 3 4 5 6
- Remove the second column
df[-2]
colA colC
a 1 3
b 2 4
c 3 5
d 4 6
e 5 7
Multiple Columns
- Retrieving columns by indexing
# By naming
df[,c("colA","colB")]
# or
df[c("colA","colB")]
# By Indexing
df[,c(1,2)]
# or
df[c(1,2)]
colA colB
a 1 2
b 2 3
c 3 4
d 4 5
e 5 6
- Retrieve columns by range
df[2:3]
# or
df[,2:3]
colB colC
a 2 3
b 3 4
c 4 5
d 5 6
e 6 7
- Removing Multiple Columns
df[-c(1,2)]
colC
a 3
b 4
c 5
d 6
e 7
Row
One Row
- Retrieve the row by index and dropping the dimension. The fourth
df[4,]
colA colB colC
d 4 5 6
- Retrieve the row by index without dropping the dimension. The fourth
df[4,,drop=FALSE]
colA colB colC
d 4 5 6
- Retrieve the row by index naming
df["d",]
colA colB colC
d 4 5 6
Multiple Rows Indexing
- Retrieve two rows by naming
df[c("b","e"),]
colA colB colC
b 2 3 4
e 5 6 7
- Retrieve the first and third rows by logical vector
df[c(TRUE,FALSE,TRUE,FALSE,FALSE),]
colA colB colC
a 1 2 3
c 3 4 5
Multiple Rows Filtering
- Retrieve the rows by logical vector where colB > 3
df[df$colB>3,]
# of
df[df[,"colB"]>3,]
# of
df[df[,2]>3,]
colA colB colC
c 3 4 5
d 4 5 6
e 5 6 7
- Retrieve the rows by logical vector where colB > 3 and colC⇐6
df[df$colB>3&df$colC<=6,]
colA colB colC
c 3 4 5
d 4 5 6
Vertical and Horizontal
- Extracting the intersection of rows and columns
df[2:4,2:3]
colB colC
b 3 4
c 4 5
d 5 6
- Update the total time column where the succes flag is 3 and the error text contains 46006
res[res$SUCCESS_FLG==3 & grepl("46066",res$ERROR_TEXT),c("TOTAL_TIME_SEC")] <-200
Documentation / Reference
?":"
?"["
?"$"
?"[["
?"[.data.frame"