About
A data frame is a logical implementation of a table in a relational database
A data frame inherits all the property and function of an object.
It has a list of variables of the same number of rows with unique row names.
A matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).
A data frame is a matrix-like structure whose columns may be of differing class (data type)
It's used as the fundamental data structure by most of R's modeling software.
A matrix implementation (array of 2 dimension) also exists
The data frame share many of the properties of matrices and lists.
They can be seen as list where every element of the list has the same length.
Articles Related
Creation
Constructor
data.frame(...,
row.names = NULL,
check.rows = FALSE,
check.names = TRUE,
stringsAsFactors = default.stringsAsFactors()
)
where:
Persistence
Import (By reading a file)
- file with a table format (read.table)
read.table()
- file with a table format and with comma separator (R - Csv)
read.csv()
Or from the R Studio GUI:
Export
To clipboard with tabulation which can be paste in Excel
writeToClipboard <- function(x,row.names=FALSE,col.names=TRUE,...) {
write.table(x,"clipboard",sep="\t",row.names=row.names,col.names=col.names,...)
}
writeToClipboard(data_frame)
Construction Example
Simple
colA=c(8,3,6,5,5)
colB=c("Nico","Klaas","Santa","Klaus","Piet")
colC=1:5
df = data.frame(colA,colB,colC)
df
colA colB colC
1 8 Nico 1
2 3 Klaas 2
3 6 Santa 3
4 5 Klaus 4
5 5 Piet 5
row.names
By default, if the arguments are all named and simple objects (not lists, matrices of data frames) then the argument names give the column names.
- the rows names are defined by the column B.
data.frame(colA,colB,colC,row.names=colB)
colA colB colC
Nico 8 Nico 1
Klaas 3 Klaas 2
Santa 6 Santa 3
Klaus 5 Klaus 4
Piet 5 Piet 5
- the rows names are defined by letters
data.frame(colA,colB,colC,row.names=letters[1:5])
colA colB colC
a 8 Nico 1
b 3 Klaas 2
c 6 Santa 3
d 5 Klaus 4
e 5 Piet 5
check.rows
check.rows will check the names of the rows when two matrix-like structure are given as argument.
df1 = data.frame(A=1:2,B=2:1, row.names=letters[1:2])
> df1
A B
a 1 2
b 2 1
> df2 = df1[2:1,]
> df2
A B
b 2 1
a 1 2
data.frame(df1,df2,check.rows=TRUE)
Error in data.row.names(row.names, rowsi, i) :
mismatch of row names in arguments of 'data.frame', item 2
because a,b is not b,a
check.names
Duplicate column names are allowed, but you need to use check.names = FALSE
Transformation
Selection, Modification
R - Subset Operators (Extract or Replace Parts of an Object)
Example:
- Select all records with a success_flg equal to 3
res[res$SUCCESS_FLG==3,]
Adding a column
data_frame$newColName <- a.vector
data_frame[, "newColName"] <- a.vector
data_frame["newColName"] <- a.vector
Join
Apply a function
- lapply: Apply a Function over a List or Vector
- by:Apply a Function to a Data Frame Split by Factors
Sort
See dplyr arrange
Update
How to
Get the number of rows and columns
# Number of rows
> nrow(df)
[1] 5
>
> # Number of columns
> ncol(df)
[1] 3
Check the attributes
> attributes(df)
$names
[1] "colA" "colB" "colC"
$row.names
[1] 1 2 3 4 5
$class
[1] "data.frame"
Get the value of a cell
- With indexing:
> df[2,1]
[1] 2
> df[1,2]
[1] Nico
Levels: Klaas Klaus Nico Piet Santa
- With row and column name:
> df2["d","colA"]
[1] 5
Convert it to a matrix
data.matrix()
Get the number of rows and columns
> df <- data.frame(A=1:2,B=1:2,C=letters[1:2])
> nrow(df)
[1] 2
> ncol(df)
[1] 3
See the header and the tail
The first two lines:
head(df,2)
The last two lines:
tail(df,2)
Detached the variables name
attach() allows a user to access the variables name (columns) of a data.frame directly.