Statistics - (Data Set|Sample)


Because of the difficulties of obtaining information about all units in a population, it is common to use a small, random and representative subset of the population called a sample.

A sample is a smaller, random and representative (group|subset|data set) of the population.

Whenever a sample is used instead of the entire population, we have to accept that our results are merely estimates and therefore have some chance of being incorrect. This is called sampling error.

Any one sample will never be perfect if we're only getting a random sample from a population.

A larger sample should not affect the mean, but would reduce the standard deviation.


Importance of Sampling

While data mining can be used to uncover patterns in data samples, it is important to be aware that:

  • the use of non-representative samples of data may produce results that are not indicative of the domain.
  • data mining will not find patterns that may be present in the domain, if those patterns are not present in the sample being “mined”. Data mining will only functions with indicative and representative data

See Statistics - Sampling

Documentation / Reference

Powered by ComboStrap