tleyden/open-ocr


Run your own OCR-as-a-Service using Tesseract and Docker

License: Apache-2.0

Language: Go


Build Status GoDoc Join the chat at https://gitter.im/tleyden/open-ocr

OpenOCR makes it simple to host your own OCR REST API.

The heavy lifting OCR work is handled by Tesseract OCR.

Docker is used to containerize the various components of the service.

screenshot

Features

  • Scalable message passing architecture via RabbitMQ.
  • Platform independence via Docker containers.
  • Kubernetes support: workers can run in a Kubernetes Replication Controller
  • Supports 31 languages in addition to English
  • Ability to use an image pre-processing chain. An example using Stroke Width Transform is provided.
  • Pass arguments to Tesseract such as character whitelist and page segment mode.
  • REST API docs
  • A Go REST client is available.

Launching OpenOCR on a Docker PAAS

OpenOCR can easily run on any PAAS that supports Docker containers. Here are the instructions for a few that have already been tested:

If your preferred PAAS isn't listed, please open a Github issue to request instructions.

Launching OpenOCR on Ubuntu 14.04

OpenOCR can be launched on anything that supports Docker, such as Ubuntu 14.04.

Here's how to install it from scratch and verify that it's working correctly.

Install Docker

See Installing Docker on Ubuntu instructions.

Find out your host address

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 08:00:27:43:40:c7
          inet addr:10.0.2.15  Bcast:10.0.2.255  Mask:255.255.255.0
          ...

The ip address 10.0.2.15 will be used as the RABBITMQ_HOST env variable below.

Launching OpenOCR with Docker Compose on Linux

  • Install docker
  • Install docker-compose
  • Checkout OpenOCR repository or at least copy all files and subdirectories from OpenOCR docker-compose directory
  • cd docker-compose directory
  • run docker-compose up to see the log in console or docker-compose up -d to run containers as daemons

Docker Compose will start four docker instances

You are now ready to decode images → text via your REST API.

Launching OpenOCR with Docker Compose on OSX

  • Install docker
  • Install docker toolbox
  • Checkout OpenOCR repository
  • cd docker-compose directory
  • docker-machine start default
  • docker-machine env
  • Look at the Docker host IP address
  • Run docker-compose up -d to run containers as daemons or docker-compose up to see the log in console

How to test the REST API after turning on the docker-compose up

Where IP_ADDRESS_OF_DOCKER_HOST is what you saw when you run docker-machine env (e.g. 192.168.99.100) and where HTTP_POST is the port number inside the .yml file inside the docker-compose directory presuming it should be the same 9292.

Request

$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://IP_ADDRESS_OF_DOCKER_HOST:HTTP_PORT/ocr

Assuming the values are (192.168.99.100 and 9292 respectively)

$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://192.168.99.100:9292/ocr

Response

It will return the decoded text for the test image:

< HTTP/1.1 200 OK
< Date: Tue, 13 May 2014 16:18:50 GMT
< Content-Length: 283
< Content-Type: text/plain; charset=utf-8
<
You can create local variables for the pipelines within the template by
prefixing the variable name with a “$" sign. Variable names have to be
composed of alphanumeric characters and the underscore. In the example
below I have used a few variations that work for variable names.

Test the REST API

Request

$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://10.0.2.15:$HTTP_PORT/ocr

Response

It will return the decoded text for the test image:

< HTTP/1.1 200 OK
< Date: Tue, 13 May 2014 16:18:50 GMT
< Content-Length: 283
< Content-Type: text/plain; charset=utf-8
<
You can create local variables for the pipelines within the template by
prefixing the variable name with a “$" sign. Variable names have to be
composed of alphanumeric characters and the underscore. In the example
below I have used a few variations that work for variable names.

The REST API also supports:

  • Uploading the image content via multipart/related, rather than passing an image URL. (example client code provided in the Go REST client)
  • Tesseract config vars (eg, equivalent of -c arguments when using Tesseract via the command line) and Page Seg Mode
  • Ability to use an image pre-processing chain, eg Stroke Width Transform.
  • Non-English languages

See the REST API docs and the Go REST client for details.

Uploading local files using curl

The supplied docs/upload-local-file.sh provides an example of how to upload a local file using curl with multipart/related encoding of the json and image data:

  • usage: docs/upload-local-file.sh <urlendpoint> <file> [mimetype]
  • download the example ocr image wget http://bit.ly/ocrimage
  • example: docs/upload-local-file.sh http://10.0.2.15:$HTTP_PORT/ocr-file-upload ocrimage

Community

Client Libraries

License

OpenOCR is Open Source and available under the Apache 2 License.

Project Statistics

Sourcerank 7
Repository Size 1000 KB
Stars 823
Forks 134
Watchers 61
Open issues 16
Dependencies 0
Contributors 8
Tags 2
Created
Last updated
Last pushed

Top Contributors See all

Traun Leyden Rednut Infomatics Michael Overmeyer Chau Thai Alex Proca Arpit Goyal simkimsia The Gitter Badger

Packages Referencing this Repo

github.com/tleyden/open-ocr
Run your own OCR-as-a-Service using Tesseract and Docker
Latest release release/1.0.2 - Published - 823 stars
github.com/tleyden/open-ocr/cli-worker
Run your own OCR-as-a-Service using Tesseract and Docker
Latest release release/1.0.2 - Published - 823 stars
github.com/tleyden/open-ocr/cli-httpd
Run your own OCR-as-a-Service using Tesseract and Docker
Latest release release/1.0.2 - Published - 823 stars
github.com/tleyden/open-ocr/cli-preprocessor
Run your own OCR-as-a-Service using Tesseract and Docker
Latest release release/1.0.2 - Published - 823 stars

Recent Tags See all

release/1.0.2 June 22, 2014
release/1.0.0 May 16, 2014

Interesting Forks See all

puffygeek/open-ocr
Run your own OCR-as-a-Service using Tesseract and Docker
Go - Apache-2.0 - Updated - 4 stars - 1 forks
mcqueenorama/open-ocr
Run your own OCR-as-a-Service using Tesseract and Docker
This repository is no longer available - 1 stars
occrp/open-ocr
Run your own OCR-as-a-Service using Tesseract and Docker
Nginx - Last pushed - 1 stars

Something wrong with this page? Make a suggestion

Last synced: 2016-11-24 15:19:04 UTC

Login to resync this repository