About
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts.
See What is a bag of words model? known also as a bag of tokens in NLP
Articles Related
Documentation / Reference
Dictionary
- See alos: Lexical Database
Wikitionary
MediaWiki
The XML schema for each dump is defined at the top of the file. And also described in the MediaWiki export help page.
- Search Page - Return a list of item : https://en.wikipedia.org/w/api.php?action=opensearch&search=MediaWiki&format=xml
- Query to get data. Example:
https://en.wikipedia.org/w/api.php?action=query
&titles=SQL # the title of the page that are in the URL separated by |
&format=xml # The exported format
&prop=description|categories # The properties exported