R - Subset Operators (Extract or Replace Parts of an Object)

1 - About

In the Data Manipulation category, you have the subset operators:

  • [
  • [[
  • $

This operators acts on:

to extract or replace parts.

See also: the subset function

3 - Usage


# Extraction
x[columnsSelector]
x[rowsSelector, columnsSelector, drop = ]
x[[rowsSelector, columnsSelector]] # shortcut can be used only to select one element (column, row) with drop = true 
x$name

# Replace
x[rowsSelector, columnsSelector] <- value

where:

  • x is an object (data frame, …)
  • rowsSelector: (optional) rows selector by index or name to extract or replace. Default all rows.
  • columnsSelector: columns selector by index or name to extract or replace.

2:3 # column 2 to 3
c("colA", "colC")

  • drop: (optional) logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left (the column becomes a list), but not to drop if only one row is left (the row stays in a data frame).
  • $ is used to extract elements of a list or data frame by name.
  • [ always returns an object of the same class as the original. It can be used to select more than one element (there is one exception)
  • [[ is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame

4 - Demo Data

We will subset the following data frame


> df = data.frame(colA=1:5,colB=2:6,colC=3:7,row.names=letters[1:5])
> df


colA colB colC
a    1    2    3
b    2    3    4
c    3    4    5
d    4    5    6
e    5    6    7

Example for a list are available here

5 - Subset Type

5.1 - Column

5.1.1 - One Column

  • Extract the column by name:

df$colA


[1] 1 2 3 4 5

  • Retrieve the column by index number and return a vector. The second:

df[,2]  # return an vector integer because the default is to have drop = true for a column (not for a row)
df[,2, drop = TRUE] # same 


[1] 2 3 4 5 6

  • Retrieve the column by index number and return a data frame. The second:

df[2]
df[,2,drop = FALSE] # same


colB
a    2
b    3
c    4
d    5
e    6

  • Retrieve the column by naming index.

df[,"colB"]


[1] 2 3 4 5 6

  • Remove the second column

df[-2]


colA colC
a    1    3
b    2    4
c    3    5
d    4    6
e    5    7

5.1.2 - Multiple Columns

  • Retrieving columns by indexing

# By naming
df[,c("colA","colB")]
# or
df[c("colA","colB")]
# By Indexing
df[,c(1,2)]
# or
df[c(1,2)]


colA colB
a    1    2
b    2    3
c    3    4
d    4    5
e    5    6

  • Retrieve columns by range

df[2:3]
# or
df[,2:3]


colB colC
a    2    3
b    3    4
c    4    5
d    5    6
e    6    7

  • Removing Multiple Columns

df[-c(1,2)]


colC
a    3
b    4
c    5
d    6
e    7

5.2 - Row

5.2.1 - One Row

  • Retrieve the row by index and dropping the dimension. The fourth

df[4,]


colA colB colC
d    4    5    6

  • Retrieve the row by index without dropping the dimension. The fourth

df[4,,drop=FALSE]


colA colB colC
d    4    5    6

  • Retrieve the row by index naming

df["d",]


colA colB colC
d    4    5    6

5.2.2 - Multiple Rows Indexing

  • Retrieve two rows by naming

df[c("b","e"),]


colA colB colC
b    2    3    4
e    5    6    7

  • Retrieve the first and third rows by logical vector

df[c(TRUE,FALSE,TRUE,FALSE,FALSE),]


colA colB colC
a    1    2    3
c    3    4    5

5.2.3 - Multiple Rows Filtering

  • Retrieve the rows by logical vector where colB > 3

df[df$colB>3,]
# of
df[df[,"colB"]>3,]
# of
df[df[,2]>3,]


colA colB colC
c    3    4    5
d    4    5    6
e    5    6    7

  • Retrieve the rows by logical vector where colB > 3 and colC⇐6

df[df$colB>3&df$colC<=6,]


colA colB colC
c    3    4    5
d    4    5    6

5.3 - Vertical and Horizontal

  • Extracting the intersection of rows and columns

df[2:4,2:3]


colB colC
b    3    4
c    4    5
d    5    6

  • Update the total time column where the succes flag is 3 and the error text contains 46006

res[res$SUCCESS_FLG==3 & grepl("46066",res$ERROR_TEXT),c("TOTAL_TIME_SEC")] <-200

6 - Documentation / Reference


?":"
?"["
?"$"
?"[["
?"[.data.frame"


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap