Text Mining - (Corpus|Corpora) - Structured set of Text Document

Text Mining


In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts.

See What is a bag of words model? known also as a bag of tokens in NLP

Documentation / Reference




The XML schema for each dump is defined at the top of the file. And also described in the MediaWiki export help page.

MediaWiki API (For Wiki bot)

    &titles=SQL   # the title of the page that are in the URL separated by |
    &format=xml   # The exported format
    &prop=description|categories # The properties exported

Discover More
Card Puncher Data Processing
Process - Poisson Process

The Poisson process is a stochastic process in which events occur: continuously independently (of the time since the last event) - (ie random) at a constant / known average rate in a fixed interval...
Text Mining
Text Mining - term frequency – inverse document frequency (tf-idf)

tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used...

Share this page:
Follow us:
Task Runner