Table of Contents

R - Bootstrap

About

Statistics - Bootstrap Resampling in R.

Bootstrap lets you get a look at the sampling distribution of statistics, for which it's really hard to develop theoretical versions. Bootstrap gives us a really easy way of doing statistics when the theory is very hard.

For example, if we calculate a statistics <math>\alpha</math> through a non-linear formula, the theory in this case is very hard and it's therefore difficult to answer the following questions:

This is a case where the bootstrap really helps out in order to answer this questions.

Bootstrap is very handy way of getting very good reliable estimates of standard error for nasty statistics.

The Standard Bootstrap

The non-linear function

alpha=function(x,y){
  vx=var(x)
  vy=var(y)
  cxy=cov(x,y)
  (vy-cxy)/(vx+vy-2*cxy)
}

The <math>\alpha</math> function will return the last line that was evaluated.

Bootstrap Wrapper Function

alpha.fn=function(data, index){
  with(data[index,],alpha(X,Y))
}

where:

Bootstrap

Since a bootstrap involves random sampling and that we want reproducible results we set the random number seed.

require(boot)
set.seed(1)

#  1,000 bootstraps.
boot.out=boot(myDataFrame,alpha.fn,R=1000)
boot.out
ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = myDataFrame, statistic = alpha.fn, R = 1000)


Bootstrap Statistics :
     original        bias    std. error
t1* 0.5758321 -7.315422e-05  0.08861826

This is really a fast function.

Plot

plot(boot.out)

R Bootstrap Plot

The block bootstrap

Correlation

A block bootstrap must be applied when there is a strong correlation between consecutive rows of the data matrix.

To see this correlation, you can plot the matrix.

matplot(dataFrame,type="l")

Effect of correlation on the standard error

When there is a strong correlation and that the data points are repeating, the sample size is in reality much smaller than the number of rows.

Which means that the standard error is underestimate because standard errors is divided by the sample size.

The wrapper function

getBetaX1=function(myData){
  fit=lm(y~X1+X2,data=myData)
  fit$coefficients["X1"]
}

From a data frame, it fits a linear model and return the regression coefficient of the variable X1.

The time series Bootstrapping function

Block bootstrap with a block size of 100

require(boot)
set.seed(1)

tsboot(myDataFrame,getBetaX1,R=50,sim="fixed",l=100)
BLOCK BOOTSTRAP FOR TIME SERIES

Fixed Block Length of 100 

Call:
tsboot(tseries = Xy, statistic = getBetaX1, R = 50, l = 100, 
    sim = "fixed")


Bootstrap Statistics :
     original     bias    std. error
t1* 0.1453263 0.06305497   0.1955884