redstork
PDF Parsing library, based on PDFium.
Requirements
- Python 3
Platfom support:
- Fairly recent Linux (Ubuntu 18.04 or better). Older systems not supported.
- MacOS 10.6 or better
- Windows support in works
Installation
pip install redstork
Features
- Convert to an image - page or arbitrary rectangle - using configurable scale
- Update document meta
- Update font encoding (for some PDF documents)
- Save document to a file
Quick start
Download a sample PDF file from here
from redstork import Document, PageObject, Glyph
doc = Document('sample.pdf')
print('Number of pages:', len(doc))
>> Number of pages: 15
print('MediaBox of the first page is:', doc[0].media_box)
>> MediaBox of the first page is: (0.0, 0.0, 612.0, 792.0)
print('Rotation of the first page is:', doc[0].rotation)
>> Rotation of the first page is: 0
print('Document title:', doc.meta['Title'])
>> Document title: Red Stork
print('First page has', len(doc[0]), 'objects')
>> First page has 4 objects
doc[0].render('page-0.ppm', scale=2) # render page #1 as image
page = doc[0]
for o in page:
if o.type == PageObject.OBJ_TYPE_TEXT:
for code, _, _ in o:
print(o.font[code], end='')
print()
>> RedStork
>> Release0.0.1
>> Apr02,2020
for fid, font in doc.fonts.items():
print(font.short_name, fid)
>> NimbusSanL-Bold (36, 0)
>> NimbusSanL-BoldItal (37, 0)
# lets generate an SVG file of the first letter on page 1
text_object = [o for o in page if o.type == PageObject.OBJ_TYPE_TEXT][0] # first text object
charcode, _, _ = text_object[0] # first character of the first text object
glyph = font.load_glyph(charcode)
path, delayed_c = [], []
for x, y, op, close in glyph:
x, y = round(x, 3), round(y, 3)
if op == Glyph.MOVETO:
path.append(f'M {x} {y}')
elif op == Glyph.LINETO:
path.append(f'L {x} {y}')
elif op == Glyph.CURVETO:
delayed_c.append(f'{x} {y}')
if len(delayed_c) == 3:
path.append('C ' + ', '.join(delayed_c))
delayed_c.clear()
if close:
path.append('Z')
path = ' '.join(path)
print('<svg><g fill="gray" transform="scale(100,-100)"><path d="' + path + '" /></g></svg>')
>> <svg><g fill="gray" transform="scale(100,-100)"><path d="M 0.291 0.289 L 0.463 0.289 C 0.52 0.289, ... L 0.318 0.414 Z" /></g></svg>