Text Mining - (Corpus|Corpora) - Structured set of Text Document

About

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts.

See What is a bag of words model? known also as a bag of tokens in NLP

Articles Related

Documentation / Reference

wiki/Text_corpus

Dictionary

English: open Carnegie Mellon University Pronouncing Dictionary
Eu: http://www.dictionaryportal.eu/en/
Nl:http://www.opentaal.org/, http://anw.inl.nl/search, Wikitionary
See alos: Lexical Database

Wikitionary

MediaWiki

The XML schema for each dump is defined at the top of the file. And also described in the MediaWiki export help page.

MediaWiki API (For Wiki bot)

Search Page - Return a list of item : https://en.wikipedia.org/w/api.php?action=opensearch&search=MediaWiki&format=xml
Parse - https://en.wikipedia.org/w/api.php?action=opensearch&search=MediaWiki&format=xml
Query to get data. Example:

https://en.wikipedia.org/w/api.php?action=query
    &titles=SQL   # the title of the page that are in the URL separated by |
    &format=xml   # The exported format
    &prop=description|categories # The properties exported