About
Dplyr aims to provide a function for each basic verb of data manipulation:
- filter() to select cases based on their values.
- arrange() to reorder the cases.
- select() and rename() to select variables based on their names.
- mutate() and transmute() to add new variables that are functions of existing variables.
- summarise() to condense multiple values to a single value.
- sample_n() and sample_frac() to take random samples.
Interface for operations on:
- …
Articles Related
One line Example
Data:
timestamp | metrics | value | |
---|---|---|---|
2017-07-03 20:08:02 | Consumer Record | 0.0e+00 | |
2017-07-03 20:08:02 | Consumer Metrics | 4.0e+04 | |
2017-07-03 20:08:02 | Buffer Ratio | 0.0e+00 | |
2017-07-03 20:08:07 | Buffer Size | 3.0e+00 | |
2017-07-03 20:08:07 | Buffer MaxSize | 4.0e+04 | |
2017-07-03 20:08:07 | Buffer Ratio | 7.5e-03 | |
.. | .. | .. |
loaderMetrics %>%
filter(metrics %like% "Consumer(.*)Records") %>%
mutate(timestamp, metrics = "Consumer Total", value) %>%
group_by(timestamp, metrics) %>%
summarize(value=sum(value))
Operations
Add new Variable
Mutate adds new variables and preserves existing
mutate(dataFrame,NewColumnName = ColumnName/1024/1024)
#
mutate(dataFrame$value = ifelse(boolean, updateValue, dataFrame$value)
)
Drop Variable (Transmute)
Transmute drops existing variables.
Group by
by <- group_by(data.frame, column1, column2)
summarise(by,
count = n()
)
Filter
# And
filter(colName == "foo" & colName2 == "bar")
# Or
filter(colName == "foo" | colName2 == "bar")
# Not Equal
filter(colName != "foo" )
# In
filter(colName %in% c("foo", "bar"))
#Not
filter(!colName %in% c("foo", "bar"))
#Na (Boolean)
filter(is.na(colName))
# Like with grepl
filter(grepl("foo",colName))
# %like% in data.table()
library(data.table)
filter(colName %like% "foo")
Count
count(dataframe)
Sort (Order)
arrange(dataframe, desc(colName))