mwdump

MediaWiki dump file reader


License
MIT
Install
pip install mwdump==0.3

Documentation

A Python package similiar to MWDumper

This package try to handle different versions of dump file, according to the dtd found at http://www.mediawiki.org/xml/export-0.{1-7}.xsd

Usage

Install the package via pip install mwdump. Then you can use it like

def process_dump_file(filename):
    from mwdump import MWDump
    with MWDump(xml_filename) as mw:
        count = 0
        for page in mw.iterpages():
            print(page['id'], page['title'], page['redirect'] if 'redirect' in page else 'NOREDIRECT')

            count += 1
            if count > 1000:
                break

if __name__ == '__main__'
    import sys
    process_dump_file(sys.argv[1])

Dependencies

  • lxml

Related Projects