A (statistician|data miner) studying a population would be interested in collecting information about different characteristics of the subject (like their length, or weight, or age) in a sample. Those characteristics are called variables.
In a database, they are just columns.
They can take on multiple values. In contrast, a constant has only one value
A variable have:
In mathematics, variables are listed among the arguments that the function takes.
Variables (of an instance) are of two types:
and have 4 levels
When a characteristic can be neatly placed into well-defined groups, or categories that do not depend on order, it is called a categorical variable (some statisticians use the word qualitative).
When we are interested in the total number of each species of tortoise, or how many individuals there are per square kilometre. This type of variable is called numerical (or quantitative).
Type of attribute | Type of model | Description |
---|---|---|
independent variable (predictors|feature) | supervised | Predictors that affect a given outcome |
dependent variable (outcome,…) | supervised | outcome that are affected by predictors |
descriptors | (unsupervised|descriptive) | Items of information being analysed for natural groupings or associations. |
Dependent variable | Independent variable |
---|---|
Dependent | Independent |
Outcome | Predictor |
… |
see Data Mining - (Missing Value|Not Available) NA
A Case Id identifies uniquely each record in order to help with model repeatability.