Statistics - Bias (Sampling error)

Thomas Bayes


Bias in stats

Bias is a systematic error in sampling or measurement.

Systematic measurement error represents bias.

It has an effect on the entire distribution (It shift it right or left).

The bias is how far off on the average the model is from the truth.

More the model is complex, more the bias will decrease.

See also: Statistics - Bias-variance trade-off (between overfitting and underfitting)

Overfitting Underfitting


  • Measurement Bias: An instrument that will always measure with the same difference.
  • Sampling Bias: If we made our sample in an area where a large number of person tend to congregate (such as a city), it may lead to population estimate that is too high.

Discover More
Cross Validation Cake
(Statistics|Data Mining) - (K-Fold) Cross-validation (rotation estimation)

Cross-validation, sometimes called rotation estimation is a resampling validation technique for assessing how the results of a statistical analysis will generalize to an independent new data set. This...
Thomas Bayes
Data Mining - (Test|Expected|Generalization) Error

Test error is the prediction error that we incur on new data. The test error is actually how well we'll do on future data the model hasn't seen. The test error is the average error that results from using...
Model Funny
Data Mining - (Function|Model)

The model is the function, equation, algorithm that predicts an outcome value from one of several predictors. During the training process, the models are build. A model uses a logic and one of several...
Third Degree Polynomial
Data Mining - (Global) Polynomial Regression (Degree)

polynomials regression Although polynomials are easy to think of, splines are much better behaved and more local. With polynomial regression, you create new variables that are just transformations...
Thomas Bayes
Data Mining - Decision boundary Visualization

Classifiers create boundaries in instance space. Different classifiers have different biases. You can explore them by visualizing the classification boundaries. Logistic Regression method produces...
Thomas Bayes
Data Mining - Noise (Unwanted variation)

Information from all past experience can be divided into two groups: information that is relevant for the future (“signal”) information that is irrelevant (“noise”). In many cases the factors...
Principal Component Pcr
Data Mining - Principal Component (Analysis|Regression) (PCA|PCR)

Principal Component Analysis (PCA) is a feature extraction method that use orthogonal linear projections to capture the underlying variance of the data. By far, the most famous dimension reduction approach...
Bed Overfitting
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Model Funny
Model Building - ReSampling Validation

Resampling method are a class of methods that estimate the test error by holding out a subset of the training set from the fitting process, and then applying the statistical learning method to those held...
Card Puncher Data Processing
Soft Skill - Bias

PareidoliaPareidolia is the tendency to interpret a vague stimulus as something known to the observer, such as seeing shapes in clouds, seeing faces in inanimate objects or abstract patterns, or hearing...

Share this page:
Follow us:
Task Runner