page-segmentation module for OCR-d
Introduction
This module implements a page segmentation algorithm based on a Fully Convolutional Network (FCN). The FCN creates a classification for each pixel in a binary image. This result is then segmented per class using XY cuts.
Requirements
- For GPU-Support: CUDA and CUDNN
- other requirements are installed via Makefile / pip, see
requirements.txt
in repository root.
Installation
If you want to use GPU support, set the environment variable TENSORFLOW_GPU
to a nonempty value, otherwise leave it unset. Then:
make deps
to install dependencies and
make install
to install the package.
Both are python packages installed via pip, so you may want to activate a virtalenv before installing.
Usage
ocrd-pc-segmentation
follows the ocrd CLI.
It expects a binary page image and produces region entries in the PageXML file.
Configuration
The following parameters are recognized in the JSON parameter file:
-
overwrite_regions
: remove previously existing text regions -
xheight
: height of character "x" in pixels used during training. -
model
: pixel-classifier model path. The special values__DEFAULT__
and__LEGACY__
load the bundled default model or previous default model respectively. -
gpu_allow_growth
: required for GPU use with some graphic cards (set to true, if you get CUDNN_INTERNAL_ERROR) -
resize_height
: scale down pixelclassifier output to this height before postprocessing. Independent of training / used model. (performance / quality tradeoff, defaults to 300)
Testing
There is a simple CLI test, that will run the tool on a single image from the assets repository.
make test-cli
Training
To train models for the pixel classifier, see its README