Data Mining - Content Analysis and Acquisition
Table of Contents
List
Software
Apache Tika (content analysis toolkit) - The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents (PDF, OppenOffice, Word, …) using existing parser libraries.