Optical character recognition (OCR)
Table of Contents
About
Optical character recognition, usually abbreviated to OCR, is the translation of:
- into text.
Software
Tesseract
All the open source software are based on the tesseract OCR engine.
Command line
tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]
VietOCR (GUI for Tesseract )
VietOCR is the only one that I found which:
- simply works
- is easy to use.
- gives great result for screen shot. You have still to check the “ScreenShot Mode” from the Image menu.
Free OCR
For windows, you have also FreeOCR 2.6. It work well but you can't process more than one page at a time. May be it's the good way because you always need to clean the result.
Library
- Javascript: https://tesseract.projectnaptha.com/