R - Dplyr (Data Frame Operations)

About

Dplyr aims to provide a function for each basic verb of data manipulation:

filter() to select cases based on their values.
arrange() to reorder the cases.
select() and rename() to select variables based on their names.
mutate() and transmute() to add new variables that are functions of existing variables.
summarise() to condense multiple values to a single value.
sample_n() and sample_frac() to take random samples.

Interface for operations on:

Articles Related

One line Example

Data:

timestamp	metrics	value
2017-07-03 20:08:02	Consumer Record	0.0e+00
2017-07-03 20:08:02	Consumer Metrics	4.0e+04
2017-07-03 20:08:02	Buffer Ratio	0.0e+00
2017-07-03 20:08:07	Buffer Size	3.0e+00
2017-07-03 20:08:07	Buffer MaxSize	4.0e+04
2017-07-03 20:08:07	Buffer Ratio	7.5e-03
..	..	..

loaderMetrics %>%
  filter(metrics %like% "Consumer(.*)Records") %>%
  mutate(timestamp, metrics = "Consumer Total", value) %>%
  group_by(timestamp, metrics) %>%
  summarize(value=sum(value))

Operations

Add new Variable

Mutate adds new variables and preserves existing

mutate(dataFrame,NewColumnName = ColumnName/1024/1024)

#
mutate(dataFrame$value = ifelse(boolean, updateValue, dataFrame$value)
)

Drop Variable (Transmute)

Transmute drops existing variables.

Group by

by <- group_by(data.frame, column1, column2)
summarise(by, 
     count = n()
)

Filter

# And
filter(colName == "foo" & colName2 == "bar")
# Or
filter(colName == "foo" | colName2 == "bar")
# Not Equal
filter(colName != "foo" )
# In
filter(colName %in% c("foo", "bar"))
#Not
filter(!colName %in% c("foo", "bar"))
#Na (Boolean)
filter(is.na(colName))
# Like with grepl 
filter(grepl("foo",colName))
# %like% in data.table()
library(data.table)
filter(colName %like% "foo")

Count

count(dataframe)

Sort (Order)

arrange(dataframe, desc(colName))

About

Articles Related

One line Example

Add new Variable

Drop Variable (Transmute)

Group by

Filter

Count

Sort (Order)

Documentation / Reference