ReSurrecT

Toolkit for HTML to reStructuredText conversion


License
Other
Install
pip install ReSurrecT==0.1

Documentation

ReSurrecT

This tool is something I made to massage the output of Adobe Acrobat HTML export into a valid Docutils Doctree.

Why did I create this? I wanted to easily convert public domain US Army Field Manuals into an easily editable format.

Requirements

  • lxml
  • docutils

Usage

  1. Export the desired PDF to HTML in Adobe PDF [1]
  2. Remove nasty output, such as headers, footers, horribly formatted tables.
  3. Utilising the standard transforms, transform common styles to Doctree elements.
  4. Use xml2rst to get the RST output.
  5. Delete your script and this, so there's a chance you'll forget you ever had to do this.
[1] Perhaps an open source package could be utilised, like PDFTOHTML.