About
A page about the problem definition in data
Characteristic
- Type of target: nominal or quantitative
- Number of parameters: Data Mining - Dimensionality (number of variable, parameter) (P)
- Type of (predictor|features): nominal or numeric.
- Mixed feature vectors (qualitative and quantitative)
- Model Complexity (Linear, Quadratic, …)
- Large data set or Not. Big N
- N bigger or smaller than P
- Is the target consistent or random ?
- Environment: noisy or not.
Type of problem
- classification, which attempts to assign each input value to one of a given set of classes (for example, determine whether a given email is “spam” or “non-spam”).
- regression, which assigns a real-valued output to each input;
- sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence);
- parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence.