transmogrify.webcrawler

Crawling and feeding html content into a transmogrifier pipeline


Keywords
transmogrifier, blueprint, funnelweb, source, plone, import, conversion, microsoft, office
License
GPL-2.0+
Install
pip install transmogrify.webcrawler==1.0b5

Documentation

Crawling - html to import

transmogrify.webcrawler will crawl html to extract pages and files as a source for your transmogrifier pipeline. transmogrify.webcrawler.typerecognitor aids in setting '_type' based on the crawled mimetype. transmogrify.webcrawler.cache helps speed up crawling and reduce memory usage by storing items locally.

These blueprints are designed to work with the funnelweb pipeline but can be used independently.