Big news! Sonar has entered a definitive agreement to acquire Tidelift!

pyxpdf
Release 0.2.3

Powerful and Pythonic PDF processing library based on xpdf-4.02

Homepage PyPI Cython

Keywords: pdf, parser, converter, text, mining, xpdf, bindings, cython, pdf-converter, pdf-parser, pdfparser, pdftohtml, pdftopng, pdftotext, python, xpdf-reader
License: GPL-2.0+
Install: pip install pyxpdf==0.2.3

Documentation

pyxpdf

pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.

docs
tests
package
license

Features

Almost x20 times faster than pure python based pdf parsers (see Speed Comparison)
Extract text while maintaining original document layout (best possible)
Support almost all PDF encodings, CMaps and predefined CMaps.
Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
No explict dependencies (except optional ones, see Installation)
Thread Safe

More Information

License

pyxpdf is licensed under the GNU General Public License (GPL), version 3. See the LICENSE

Credits

xpdf reader by Derek Noonburg
lxml - project structure and build adapted from lxml
poppler project

Dependencies: 4
Dependent packages: 1
Dependent repositories: 0
Total releases: 6
Latest release: Aug 31, 2020
First release: May 2, 2020
Stars: 31
Forks: 13
Watchers: 3
Contributors: 3
Repository size: 12.2 MB
SourceRank: 7

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!

Releases

0.2.3: Aug 31, 2020
0.2.2: Jul 3, 2020
0.2.1: Jun 12, 2020
0.2.0: Jun 11, 2020
0.1.1: May 11, 2020
0.1: May 2, 2020

Contributors

See all contributors

Used by

brianmhunt/knockout-modal: 8

See usage by version

Login to resync this project