R - Data frame Object

1 - About

A data frame is a logical implementation of a table in a relational database

A data frame inherits all the property and function of an object.

It has a list of variables of the same number of rows with unique row names.

A matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).

A data frame is a matrix-like structure whose columns may be of differing class (data type)

It's used as the fundamental data structure by most of R's modeling software.

A matrix implementation (array of 2 dimension) also exists

The data frame share many of the properties of matrices and lists.

They can be seen as list where every element of the list has the same length.

3 - Creation

3.1 - Constructor


data.frame(..., 
     row.names = NULL, 
     check.rows = FALSE,
     check.names = TRUE,
     stringsAsFactors = default.stringsAsFactors()
     )

where:

  • … is a list of object that have the same number of rows.
  • row.names is a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
  • check.rows checks the rows for consistency of length and names if true.

4 - Persistence

4.1 - Import (By reading a file)


read.table()

  • file with a table format and with comma separator (R - Csv)

read.csv()

Or from the R Studio GUI:

4.2 - Export

To clipboard with tabulation which can be paste in Excel


writeToClipboard <- function(x,row.names=FALSE,col.names=TRUE,...) {
  write.table(x,"clipboard",sep="\t",row.names=row.names,col.names=col.names,...)
}
writeToClipboard(data_frame)

5 - Construction Example

5.1 - Simple


colA=c(8,3,6,5,5)
colB=c("Nico","Klaas","Santa","Klaus","Piet")
colC=1:5
df = data.frame(colA,colB,colC)
df


colA  colB colC
1    8  Nico    1
2    3 Klaas    2
3    6 Santa    3
4    5 Klaus    4
5    5  Piet    5

5.2 - row.names

By default, if the arguments are all named and simple objects (not lists, matrices of data frames) then the argument names give the column names.

  • the rows names are defined by the column B.

data.frame(colA,colB,colC,row.names=colB)


colA  colB colC
Nico     8  Nico    1
Klaas    3 Klaas    2
Santa    6 Santa    3
Klaus    5 Klaus    4
Piet     5  Piet    5

  • the rows names are defined by letters

data.frame(colA,colB,colC,row.names=letters[1:5])


colA  colB colC
a    8  Nico    1
b    3 Klaas    2
c    6 Santa    3
d    5 Klaus    4
e    5  Piet    5

5.3 - check.rows

check.rows will check the names of the rows when two matrix-like structure are given as argument.


df1 = data.frame(A=1:2,B=2:1, row.names=letters[1:2])
> df1


A B
a 1 2
b 2 1


> df2 = df1[2:1,]
> df2


A B
b 2 1
a 1 2


data.frame(df1,df2,check.rows=TRUE)


Error in data.row.names(row.names, rowsi, i) : 
  mismatch of row names in arguments of 'data.frame', item 2

because a,b is not b,a

5.4 - check.names

Duplicate column names are allowed, but you need to use check.names = FALSE

6 - Transformation

6.1 - Selection, Modification

R - Subset Operators (Extract or Replace Parts of an Object)

Example:

  • Select all records with a success_flg equal to 3

res[res$SUCCESS_FLG==3,]

6.2 - Adding a column


data_frame$newColName <- a.vector
data_frame[, "newColName"] <- a.vector
data_frame["newColName"] <- a.vector

6.3 - Join

6.4 - Apply a function

  • lapply: Apply a Function over a List or Vector
  • by:Apply a Function to a Data Frame Split by Factors

6.5 - Sort

6.6 - Update

7 - How to

7.1 - Get the number of rows and columns


# Number of rows
> nrow(df)
[1] 5
> 
> # Number of columns
> ncol(df)
[1] 3

7.2 - Check the attributes


> attributes(df)
$names
[1] "colA" "colB" "colC"

$row.names
[1] 1 2 3 4 5

$class
[1] "data.frame"

7.3 - Get the value of a cell

  • With indexing:

> df[2,1]
[1] 2
> df[1,2]
[1] Nico
Levels: Klaas Klaus Nico Piet Santa

  • With row and column name:

> df2["d","colA"]
[1] 5

7.4 - Convert it to a matrix

data.matrix()

7.5 - Get the number of rows and columns


> df <- data.frame(A=1:2,B=1:2,C=letters[1:2])
> nrow(df)
[1] 2
> ncol(df)
[1] 3

7.6 - See the header and the tail

The first two lines:


head(df,2)

The last two lines:


tail(df,2)

7.7 - Detached the variables name

attach() allows a user to access the variables name (columns) of a data.frame directly.

8 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap