pytesseract-cli
A command-line wrapper for pytesseract
, a Python wrapper for tesseract
.
Description
This is a command-line wrapper to enable easier usage of the Tesseract OCR engine with multiple files and/or directories. The project itself is written in Python, and uses pytesseract for interaction with tesseract
.
Benefits of this interface include the ability to easily parse multiple images and files, as well as recurse upon directories.
Requirements
Basic requirements are an up-to-date installation of Python 3 and Tesseract OCR.
Tesseract
Both the Tesseract OCR engine as well as any training data for desired languages must be installed.
Both of the above are available, for example, on the ArchLinux User Repository:
tesseract
tesseract-data
Installation
This project is available on PyPI under the page pytesseract-cli.
Using pip:
pip install pytesseract-cli
to upgrade:
pip install -U pytesseract-cli
Usage
In a terminal:
$ pytesseract-cli
usage: pytesseract-cli [-h] [-f [FILES ...]] [-d [DIRECTORIES ...]] [-r] [-t {pdf,txt}] [-l LANG] [--list-languages]
optional arguments:
-h, --help show this help message and exit
-f [FILES ...] name(s) of file(s) to process
-d [DIRECTORIES ...] directory(s) to process
-r recurse on all directories listed
-t {pdf,txt} desired output filetype
-l LANG language of the text in any image(s)
--list-languages list all languages available
Acknowledgements
-
pytesseract for providing an easy-to-use wrapper for
tesseract
. - tesseract for providing a free and open-source OCR engine.