tightocr
Release 0.4.4

Thin and pleasant wrapper for Tesseract OCR.

Keywords: ocr, tesseract, ctesseract
License: GPL-2.0
Install: pip install tightocr==0.4.4

Documentation

Introduction
============

TightOCR provides a thin library to provide an efficient, pleasant, Pythonic 
interface to Tesseract. Tesseract (https://code.google.com/p/tesseract-ocr/) 
is the world's most universal OCR project, owned by Google. 

The primary goals of this implementation is to provide
the following functionalities to Python:

> OCR a document and return a block of text.
> OCR a document, and identify the various parts of a document to allow an
  application to take advantage of Tesseract's layout analysis.

Secondary functions that are available:

> Confidence of recognition
> HTML-formatted output
> Slope and margin of text

Though I have tried to provide access to as many of the API methods as 
possible, there is a very limited amount of documentation available, so many
of the more exotic functions haven't been properly tested. IF YOU WANT TO HELP
WITH THIS, JUST DO IT AND REGISTER ISSUES OR SUBMIT PULL-REQUESTS.


This library was built as an alternative to python-tesseract 
(http://code.google.com/p/python-tesseract) for the following reasons:

> The usage of SWIG produces an implementation that is excessive and 
  burdensome.
> python-tesseract is, ironically, incomplete. You are unable to enumerate the
  parts of the document (getIterator() is broken: 
  http://code.google.com/p/python-tesseract/issues/detail?id=50&can=4&sort=-id)


Requirements
============

CTesseract (https://github.com/dsoprea/CTesseract)
Leptonica (http://code.google.com/p/leptonica/)


Installation
============

The Leptonica and CTesseract shared-libraries (liblept.so, 
libctesseract.so) must be findable.


Usage
=====

Return the whole document as text:

    from tightocr.adapters.api_adapter import TessApi
    from tightocr.adapters.lept_adapter import pix_read

    t = TessApi(None, 'eng');

    p = pix_read('receipt.png')
    t.set_image_pix(p)

    t.recognize()
    if t.mean_text_confidence() < 85:
        raise Exception("Too much error.")

    print(t.get_utf8_text())

Enumerate individual blocks of text (referred to as "paragraphs"), driven by 
the document's layout:

    from tightocr.adapters.api_adapter import TessApi
    from tightocr.adapters.lept_adapter import pix_read
    from tightocr.constants import RIL_PARA

    t = TessApi(None, 'eng');
    p = pix_read('receipt.png')
    t.set_image_pix(p)
    t.recognize()

    if t.mean_text_confidence() < 85:
        raise Exception("Too much error.")

    for block in t.iterate(RIL_PARA):
        print block

Dependencies: 0
Dependent packages: 0
Dependent repositories: 3
Total releases: 5
Latest release: Apr 25, 2014
First release: Dec 15, 2013
Stars: 25
Forks: 3
Watchers: 4
Contributors: 1
Repository size: 832 KB
SourceRank: 8

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!
Package manager 2FA enabled: TEXT!

Releases

0.4.4: Apr 25, 2014
0.4.3: Jan 26, 2014
0.4.2: Jan 26, 2014
0.4.1: Jan 26, 2014
0.4.0: Dec 15, 2013

Contributors

See all contributors

Something wrong with this page? Make a suggestion

Export .ABOUT file for this package

Last synced: 2021-12-15 15:05:52 UTC

tightocr
Release 0.4.4

Release 0.4.4

0.4.4

0.4.3

0.4.2

0.4.1

0.4.0

Documentation

Stats

Development practices

Releases

Contributors

tightocr Release 0.4.4

Release 0.4.4 Toggle Dropdown 0.4.4 0.4.3 0.4.2 0.4.1 0.4.0

Documentation

Stats

Development practices

Releases

Contributors

tightocr
Release 0.4.4

Release 0.4.4

0.4.4

0.4.3

0.4.2

0.4.1

0.4.0