A data frame is a logical implementation of a table in a relational database
A data frame inherits all the property and function of an object.
It has a list of variables of the same number of rows with unique row names.
A matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).
A data frame is a matrix-like structure whose columns may be of differing class (data type)
It's used as the fundamental data structure by most of R's modeling software.
A matrix implementation (array of 2 dimension) also exists
The data frame share many of the properties of matrices and lists.
They can be seen as list where every element of the list has the same length.
data.frame(...,
row.names = NULL,
check.rows = FALSE,
check.names = TRUE,
stringsAsFactors = default.stringsAsFactors()
)
where:
read.table()
read.csv()
Or from the R Studio GUI:
To clipboard with tabulation which can be paste in Excel
writeToClipboard <- function(x,row.names=FALSE,col.names=TRUE,...) {
write.table(x,"clipboard",sep="\t",row.names=row.names,col.names=col.names,...)
}
writeToClipboard(data_frame)
colA=c(8,3,6,5,5)
colB=c("Nico","Klaas","Santa","Klaus","Piet")
colC=1:5
df = data.frame(colA,colB,colC)
df
colA colB colC
1 8 Nico 1
2 3 Klaas 2
3 6 Santa 3
4 5 Klaus 4
5 5 Piet 5
By default, if the arguments are all named and simple objects (not lists, matrices of data frames) then the argument names give the column names.
data.frame(colA,colB,colC,row.names=colB)
colA colB colC
Nico 8 Nico 1
Klaas 3 Klaas 2
Santa 6 Santa 3
Klaus 5 Klaus 4
Piet 5 Piet 5
data.frame(colA,colB,colC,row.names=letters[1:5])
colA colB colC
a 8 Nico 1
b 3 Klaas 2
c 6 Santa 3
d 5 Klaus 4
e 5 Piet 5
check.rows will check the names of the rows when two matrix-like structure are given as argument.
df1 = data.frame(A=1:2,B=2:1, row.names=letters[1:2])
> df1
A B
a 1 2
b 2 1
> df2 = df1[2:1,]
> df2
A B
b 2 1
a 1 2
data.frame(df1,df2,check.rows=TRUE)
Error in data.row.names(row.names, rowsi, i) :
mismatch of row names in arguments of 'data.frame', item 2
because a,b is not b,a
Duplicate column names are allowed, but you need to use check.names = FALSE
R - Subset Operators (Extract or Replace Parts of an Object)
Example:
res[res$SUCCESS_FLG==3,]
data_frame$newColName <- a.vector
data_frame[, "newColName"] <- a.vector
data_frame["newColName"] <- a.vector
See dplyr arrange
# Number of rows
> nrow(df)
[1] 5
>
> # Number of columns
> ncol(df)
[1] 3
> attributes(df)
$names
[1] "colA" "colB" "colC"
$row.names
[1] 1 2 3 4 5
$class
[1] "data.frame"
> df[2,1]
[1] 2
> df[1,2]
[1] Nico
Levels: Klaas Klaus Nico Piet Santa
> df2["d","colA"]
[1] 5
data.matrix()
> df <- data.frame(A=1:2,B=1:2,C=letters[1:2])
> nrow(df)
[1] 2
> ncol(df)
[1] 3
The first two lines:
head(df,2)
The last two lines:
tail(df,2)
attach() allows a user to access the variables name (columns) of a data.frame directly.