Watch our latest webinar to understand the difference between data from Libraries.io and the Tidelift Subscription.

pdfdata
Release 0.1.3.2

Extracting text and data from PDFs

License: MIT
Install: pip install pdfdata==0.1.3.2

Documentation

{pdfdata}

Python package for extracting text and data from PDFs.

Installation

pip install pdfdata

Usage

from pdfdata import *
from pprint import pprint


# parse pdf as dictionary
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res        = pdf_doc_extract_span_list(pdf_parsed)

pprint(res, depth=3)



# parse pdf as list of spans
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res        = pdf_doc_extract_span_df(pdf_parsed)

pprint(res[0])




# transform pdf text to jsonnl
pdf_text_to_jsonnl('pdfs/0641-20.pdf', '0641-20.jsonnl')

DevNotes

build

python -m build

pypi test upload

python -m twine upload --repository testpypi dist/* --skip-existing

Dependencies: 1
Dependent packages: 0
Dependent repositories: 0
Total releases: 3
Latest release: Feb 8, 2021
First release: Feb 4, 2021
Stars: 0
Forks: 0
Watchers: 1
Contributors: 1
Repository size: 1.22 MB
SourceRank: 6

Source repo 2FA enabled: TEXT!
Package manager 2FA enabled: TEXT!
Is security responsive: TEXT!
Dependencies are managed: TEXT!
Issue-free release available: TEXT!
Succession plan available: TEXT!
Package manager 2FA enabled: TEXT!

Releases

0.1.3.2: Feb 8, 2021
0.1.2: Feb 7, 2021
0.1.1: Feb 4, 2021

Contributors

See all contributors

pdfdata
Release 0.1.3.2

Release 0.1.3.2

0.1.3.2

0.1.2

0.1.1

Documentation

{pdfdata}

Installation

Usage

DevNotes

Stats

Development practices

Releases

Contributors

pdfdata Release 0.1.3.2

Release 0.1.3.2 Toggle Dropdown 0.1.3.2 0.1.2 0.1.1

Documentation

{pdfdata}

Installation

Usage

DevNotes

Stats

Development practices

Releases

Contributors

pdfdata
Release 0.1.3.2

Release 0.1.3.2

0.1.3.2

0.1.2

0.1.1