pdfdata

Extracting text and data from PDFs


License
MIT
Install
pip install pdfdata==0.1.3.2

Documentation

Python 3.6, 3.7, 3.8, 3.9 Downloads Total Downloads per Month

{pdfdata}

Python package for extracting text and data from PDFs.

Installation

pip install pdfdata

Usage

from pdfdata import *
from pprint import pprint


# parse pdf as dictionary
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res        = pdf_doc_extract_span_list(pdf_parsed)

pprint(res, depth=3)



# parse pdf as list of spans
pdf_parsed = parse_pdf('pdfs/0641-20.pdf')
res        = pdf_doc_extract_span_df(pdf_parsed)

pprint(res[0])




# transform pdf text to jsonnl
pdf_text_to_jsonnl('pdfs/0641-20.pdf', '0641-20.jsonnl')

DevNotes

build

python -m build

pypi test upload

python -m twine upload --repository testpypi dist/* --skip-existing