About
The factor function is used to encode a vector as a factor (ie categorical data).
When used with a numeric or a date, a binning function will return a factor.
From numeric to a category (For instance, for an id)
A factor is also known as:
Factors can be unordered or ordered.
A factor is an integer vector where each integer has a label
use the str function to see it.
Factors are treated specially by modelling functions like lm() and glm()
Articles Related
Syntax
factor(
v = character(),
levels,
labels = levels,
exclude = NA,
ordered = is.ordered(x),
nmax = NA
)
where:
Management
Simple Initialization
A factor of colours with 4 values and 3 levels
> x=factor(c("Green","Blue","Red","Green"))
> x
[1] Green Blue Red Green
Levels: Blue Green Red
We can see that a factor is a composition of labels and integer vector (2 1 3 2):
str(x)
Factor w/ 3 levels "Blue","Green",..: 2 1 3 2
- with Backlinks,
> unclass(x)
[1] 2 1 3 2
attr(,"levels")
[1] "Blue" "Green" "Red"
Level
The same factor of colours as above but with only two colours in the level (domain). One value becomes NA. If you want NA as level see the how to section
> x=factor(c("Green","Blue","Red","Green"),levels=c("Green","Blue"))
> x
[1] Green Blue <NA> Green
Levels: Green Blue
You can get the levels with the levels function
levels(x)
[1] "Green" "Blue"
Label
A factor of colours with two colours levels and different level labels
> x=factor(c("Green","Blue","Green"),levels=c("Green","Blue"),labels=c("LabelGreen","LabelBlue"))
> x
[1] LabelGreen LabelBlue LabelGreen
Levels: LabelGreen LabelBlue
Exclude
A factor of colours with a colour excluded:
> x=factor(c("Green","Blue","Green"),exclude="Green")
> x
[1] <NA> Blue <NA>
Levels: Blue
How to
Count the number of element by level
with the table function:
> x=factor(c("Green","Blue","Red","Green"))
> table(x)
x
Blue Green Red
1 2 1
Have NA as level
If you want NA as a level (ie allow missing values)
> x = factor(c("Blue", NA), exclude = NULL)
> x
[1] Blue <NA>
Levels: Blue <NA>
Transform it back as a vector
as.character(x)
as.numeric(x)
Order
The default order is alphabetical.
- The function reorder: Reorder Levels of a Factor
Continuous to Factor
Date to weekday
Example creation of a weekday factor
data_frame$CREATED_ON_WEEKDAY <- factor(weekdays(data_frame$CREATED_ON),levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
Number to bin
df$ageFactor <- cut(df$age, breaks=c(0, 15, 45, 56, Inf))
Since even the integers are converted to character strings, they are sorted in a dictionary order (rather than by magnitude).
Documentation / Reference
?factor