R - Dplyr (Data Frame Operations)

Card Puncher Data Processing


Dplyr aims to provide a function for each basic verb of data manipulation:

  • filter() to select cases based on their values.
  • arrange() to reorder the cases.
  • select() and rename() to select variables based on their names.
  • mutate() and transmute() to add new variables that are functions of existing variables.
  • summarise() to condense multiple values to a single value.
  • sample_n() and sample_frac() to take random samples.

Interface for operations on:

One line Example


timestamp metrics value
2017-07-03 20:08:02 Consumer Record 0.0e+00
2017-07-03 20:08:02 Consumer Metrics 4.0e+04
2017-07-03 20:08:02 Buffer Ratio 0.0e+00
2017-07-03 20:08:07 Buffer Size 3.0e+00
2017-07-03 20:08:07 Buffer MaxSize 4.0e+04
2017-07-03 20:08:07 Buffer Ratio 7.5e-03
.. .. ..
loaderMetrics %>%
  filter(metrics %like% "Consumer(.*)Records") %>%
  mutate(timestamp, metrics = "Consumer Total", value) %>%
  group_by(timestamp, metrics) %>%


Add new Variable

Mutate adds new variables and preserves existing

mutate(dataFrame,NewColumnName = ColumnName/1024/1024)

mutate(dataFrame$value = ifelse(boolean, updateValue, dataFrame$value)

Drop Variable (Transmute)

Transmute drops existing variables.

Group by

by <- group_by(data.frame, column1, column2)
     count = n()


# And
filter(colName == "foo" & colName2 == "bar")
# Or
filter(colName == "foo" | colName2 == "bar")
# Not Equal
filter(colName != "foo" )
# In
filter(colName %in% c("foo", "bar"))
filter(!colName %in% c("foo", "bar"))
#Na (Boolean)
# Like with grepl 
# %like% in data.table()
filter(colName %like% "foo")



Sort (Order)

arrange(dataframe, desc(colName))

Documentation / Reference

Discover More
R Studio Import Dataset
R - Data frame Object

A data frame is a logical implementation of a table in a relational database A data frame inherits all the property and function of an object. It has a list of variables of the same number of rows with...
Card Puncher Data Processing
R - Summary

summary statistics function summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which depend...
Card Puncher Data Processing
R - TopN Analysis

One column All column the first two rows for each month with a
Rstudio Spark View Iris
Sparklyr - Table

One a connection is made, the table (or data frame) is manipulated with Load the iris data set into Spark. The new object will be temporary, limited to the current connection to the source....

Share this page:
Follow us:
Task Runner