leaf-focus

Extract structured text from pdf files.

Install

Install from PyPI using pip:

pip install leaf-focus

Download the Xpdf command line tools and extract the executable files.

Provide the directory containing the executable files as --exe-dir.

Usage

usage: leaf-focus [-h] [--version] --exe-dir EXE_DIR [--page-images] [--ocr]
                  [--first FIRST] [--last LAST]
                  [--log-level {debug,info,warning,error,critical}]
                  input_pdf output_dir

Extract structured text from a pdf file.

positional arguments:
  input_pdf             path to the pdf file to read
  output_dir            path to the directory to save the extracted text files

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --exe-dir EXE_DIR     path to the directory containing xpdf executable files
  --page-images         save each page of the pdf as a separate image
  --ocr                 run optical character recognition on each page of the
                        pdf
  --first FIRST         the first pdf page to process
  --last LAST           the last pdf page to process
  --log-level {debug,info,warning,error,critical}
                        the log level: debug, info, warning, error, critical

Examples

# Extract the pdf information and embedded text.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages

# Extract the pdf information, embedded text, an image of each page, and Optical Character Recognition results of each page.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages --ocr

Dependencies

xpdf
keras-ocr
Tensorflow (can optionally be run more efficiently using one or more GPUs)

leaf-focus
Release 0.6.2

Release 0.6.2

0.6.2

0.6.0

0.5.3

0.5.2

0.5.0

0.4.1

0.4.0

Documentation

leaf-focus

Install

Usage

Examples

Dependencies

Stats

Development practices

Releases

Contributors

leaf-focus Release 0.6.2

Release 0.6.2 Toggle Dropdown 0.6.2 0.6.0 0.5.3 0.5.2 0.5.0 0.4.1 0.4.0

Documentation

leaf-focus

Install

Usage

Examples

Dependencies

Stats

Development practices

Releases

Contributors

leaf-focus
Release 0.6.2

Release 0.6.2

0.6.2

0.6.0

0.5.3

0.5.2

0.5.0

0.4.1

0.4.0