Table of Contents

About

Data Mining - (Missing Value|Not Available) NA in R.

NA (Not Available|Missing Values) is a logical constant.

See

?NA
> str(NA)
 logi NA

NA values have a class. There are integer NA, character NA, etc.

NA means “Not Available”.

NA is a logical constant of length 1 which contains a missing value indicator.

Function

is.na

  • Is NA NA ?
> is.na(NA)
[1] TRUE
  • Where is NA ?
> v =  c(1, NA, 2)
> is.na(v)
[1] FALSE  TRUE FALSE
> is.na(NaN)
[1] TRUE

Management

analytics

vapply(data.frame, function(x) mean(!is.na(x)), numeric(1))

remove

vector

v =  c(1, NA, 2)
> v[!is.na(v)]
[1] 1 2

matrix

colA = c(1, 2, 3, 4, NA, 6)
colB = c(1, 2, NA, 4, 5, 6)
m=cbind(colA,colB)
m
colA colB
[1,]    1    1
[2,]    2    2
[3,]    3   NA
[4,]    4    4
[5,]   NA    5
[6,]    6    6

rowWithoutNa = complete.cases(m)
rowWithoutNa
[1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE

m[rowWithoutNa,]
colA colB
[1,]    1    1
[2,]    2    2
[3,]    4    4
[4,]    6    6

eliminate the row

to eliminate any row in the data frame that's got a missing value in it, there's a nice function called na.omit

myDataFrame=na.omit(myDataFrame)

Count

in a variable

with(dataFrame,sum(is.na(myVariableName)))

Create them

myNaVector=rep(NA,10)