Information from all past experience can be divided into two groups:
- information that is relevant for the future (“signal”)
- information that is irrelevant (“noise”).
In many cases the factors causing the unwanted variation are unknown and must be inferred from the data.
Noise can be seen as the result of:
- bad data quality.
- rapid phenomena.
The noise tries to be represented by calculating the prediction error
All information got random noise that is related to the data collection process.
Example: reading of GPS 'jump around' though always remaining within a few meters of the real position.
Documentation / Reference
- Removing Unwanted Variation from High Dimensional Data with Negative Controls. Negative controls may be used to identify the unwanted variation and separate it from the wanted variation.