Natural Language - Crawler

About

A crawler is an application (bot) that reads a document (such as web page, word file, ..) and parse them to extract meaningful information.

Software for scanning large bodies of text such as collections of Web pages to find occurrences of words, phrases or other patterns.

They are implemented as finite automata

Articles Related

Type

The most known crawler are web crawler

Documentation / Reference

http://wiki.apache.org/nutch/Nutch2Crawling