ReSurrecT
This tool is something I made to massage the output of Adobe Acrobat HTML export into a valid Docutils Doctree.
Why did I create this? I wanted to easily convert public domain US Army Field Manuals into an easily editable format.
Requirements
- lxml
- docutils
Usage
- Export the desired PDF to HTML in Adobe PDF [1]
- Remove nasty output, such as headers, footers, horribly formatted tables.
- Utilising the standard transforms, transform common styles to Doctree elements.
- Use xml2rst to get the RST output.
- Delete your script and this, so there's a chance you'll forget you ever had to do this.
[1] | Perhaps an open source package could be utilised, like PDFTOHTML. |