Statistics - Bootstrap Resampling in R.
Bootstrap lets you get a look at the sampling distribution of statistics, for which it's really hard to develop theoretical versions. Bootstrap gives us a really easy way of doing statistics when the theory is very hard.
For example, if we calculate a statistics <math>\alpha</math> through a non-linear formula, the theory in this case is very hard and it's therefore difficult to answer the following questions:
This is a case where the bootstrap really helps out in order to answer this questions.
Bootstrap is very handy way of getting very good reliable estimates of standard error for nasty statistics.
alpha=function(x,y){
vx=var(x)
vy=var(y)
cxy=cov(x,y)
(vy-cxy)/(vx+vy-2*cxy)
}
The <math>\alpha</math> function will return the last line that was evaluated.
alpha.fn=function(data, index){
with(data[index,],alpha(X,Y))
}
where:
Since a bootstrap involves random sampling and that we want reproducible results we set the random number seed.
require(boot)
set.seed(1)
# 1,000 bootstraps.
boot.out=boot(myDataFrame,alpha.fn,R=1000)
boot.out
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = myDataFrame, statistic = alpha.fn, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 0.5758321 -7.315422e-05 0.08861826
This is really a fast function.
plot(boot.out)
A block bootstrap must be applied when there is a strong correlation between consecutive rows of the data matrix.
To see this correlation, you can plot the matrix.
matplot(dataFrame,type="l")
When there is a strong correlation and that the data points are repeating, the sample size is in reality much smaller than the number of rows.
Which means that the standard error is underestimate because standard errors is divided by the sample size.
getBetaX1=function(myData){
fit=lm(y~X1+X2,data=myData)
fit$coefficients["X1"]
}
From a data frame, it fits a linear model and return the regression coefficient of the variable X1.
Block bootstrap with a block size of 100
require(boot)
set.seed(1)
tsboot(myDataFrame,getBetaX1,R=50,sim="fixed",l=100)
BLOCK BOOTSTRAP FOR TIME SERIES
Fixed Block Length of 100
Call:
tsboot(tseries = Xy, statistic = getBetaX1, R = 50, l = 100,
sim = "fixed")
Bootstrap Statistics :
original bias std. error
t1* 0.1453263 0.06305497 0.1955884