Table of Contents

Text Mining - (Corpus|Corpora) - Structured set of Text Document

About

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts.

See What is a bag of words model? known also as a bag of tokens in NLP

Documentation / Reference

Dictionary

Wikitionary

MediaWiki

The XML schema for each dump is defined at the top of the file. And also described in the MediaWiki export help page.

MediaWiki API (For Wiki bot)

https://en.wikipedia.org/w/api.php?action=query
    &titles=SQL   # the title of the page that are in the URL separated by |
    &format=xml   # The exported format
    &prop=description|categories # The properties exported