Table of Contents

About

Dplyr aims to provide a function for each basic verb of data manipulation:

  • filter() to select cases based on their values.
  • arrange() to reorder the cases.
  • select() and rename() to select variables based on their names.
  • mutate() and transmute() to add new variables that are functions of existing variables.
  • summarise() to condense multiple values to a single value.
  • sample_n() and sample_frac() to take random samples.

Interface for operations on:

One line Example

Data:

timestamp metrics value
2017-07-03 20:08:02 Consumer Record 0.0e+00
2017-07-03 20:08:02 Consumer Metrics 4.0e+04
2017-07-03 20:08:02 Buffer Ratio 0.0e+00
2017-07-03 20:08:07 Buffer Size 3.0e+00
2017-07-03 20:08:07 Buffer MaxSize 4.0e+04
2017-07-03 20:08:07 Buffer Ratio 7.5e-03
.. .. ..
loaderMetrics %>%
  filter(metrics %like% "Consumer(.*)Records") %>%
  mutate(timestamp, metrics = "Consumer Total", value) %>%
  group_by(timestamp, metrics) %>%
  summarize(value=sum(value))

Operations

Add new Variable

Mutate adds new variables and preserves existing

mutate(dataFrame,NewColumnName = ColumnName/1024/1024)

#
mutate(dataFrame$value = ifelse(boolean, updateValue, dataFrame$value)
)

Drop Variable (Transmute)

Transmute drops existing variables.

Group by

by <- group_by(data.frame, column1, column2)
summarise(by, 
     count = n()
)

Filter

# And
filter(colName == "foo" & colName2 == "bar")
# Or
filter(colName == "foo" | colName2 == "bar")
# Not Equal
filter(colName != "foo" )
# In
filter(colName %in% c("foo", "bar"))
#Not
filter(!colName %in% c("foo", "bar"))
#Na (Boolean)
filter(is.na(colName))
# Like with grepl 
filter(grepl("foo",colName))
# %like% in data.table()
library(data.table)
filter(colName %like% "foo")

Count

count(dataframe)

Sort (Order)

arrange(dataframe, desc(colName))

Documentation / Reference