Namu Wiki Extractor

This library strips all namu marks from a namu wiki document and extracts its plain text only.

Requirement

Python 3

Installation

pip install namu-wiki-extractor

Usage

Basic

import json
from namuwiki.extractor import extract_text

with open('namu_wiki.json', 'r', encoding='utf-8') as input_file:
    namu_wiki = json.load(input_file)

item = namu_wiki[1]
plain_text = extract_text(item['text'])
print(plain_text)

Extract deletions and footnotes separately

import json
from namuwiki.extractor import extract_text

with open('namu_wiki.json', 'r', encoding='utf-8') as input_file:
    namu_wiki = json.load(input_file)

item = namu_wiki[1]
document = extract_text(item['text'], separate_deletions=True, separate_footnotes=True)
print(document.text)
print(document.deletions)
print(document.footnotes)

Multiprocessing

import json
from multiprocessing import Pool

from namuwiki.extractor import extract_text

def work(document):
    return {
        'title': document['title'],
        'content': extract_text(document['text'])
    }

with open('namu_wiki.json', 'r', encoding='utf-8') as input_file:
    namu_wiki = json.load(input_file)

with Pool() as pool:
    items = pool.map(work, namu_wiki)

API

namuwiki.extractor.extract_text(source: str, separate_deletions: bool = False, separate_footnotes: bool = False) -> Union[str, Document]

This function strips all namu marks from source and extracts its plain text. If either separate_deletions or separate_footnotes is True, this returns extracted plain text as str. Otherwise, this returns extracted plain text, deletions and footnotes as Document

Parameter

source: Text from a namu wiki document
separate_deletions: Whether deletions should be separately extracted from the source
separate_footnotes: Whether footnotes should be separately extracted from the source

namuwiki.extractor.Document(text: str, deletions: List[str], footnotes: List[str])

text: Plain text with all namu marks removed from the given source
deletions: Separately extracted deletions from the given source
footnotes: Separately extracted footnotes from the given source

Note

A JSON dump file of namu wiki can be downloaded from here

namu-wiki-extractor
Release 0.3.0

Release 0.3.0

0.3.0

0.2.3

0.2.2

0.2.1

0.2.0

0.1.1

0.1.0

Documentation

Namu Wiki Extractor

Requirement

Installation

Usage

Basic

Extract deletions and footnotes separately

Multiprocessing

API

namuwiki.extractor.extract_text(source: str, separate_deletions: bool = False, separate_footnotes: bool = False) -> Union[str, Document]

Parameter

namuwiki.extractor.Document(text: str, deletions: List[str], footnotes: List[str])

Note

Stats

Development practices

Releases

Contributors

namu-wiki-extractor Release 0.3.0

Release 0.3.0 Toggle Dropdown 0.3.0 0.2.3 0.2.2 0.2.1 0.2.0 0.1.1 0.1.0

Documentation

Namu Wiki Extractor

Requirement

Installation

Usage

Basic

Extract deletions and footnotes separately

Multiprocessing

API

namuwiki.extractor.extract_text(source: str, separate_deletions: bool = False, separate_footnotes: bool = False) -> Union[str, Document]

Parameter

namuwiki.extractor.Document(text: str, deletions: List[str], footnotes: List[str])

Note

Stats

Development practices

Releases

Contributors

namu-wiki-extractor
Release 0.3.0

Release 0.3.0

0.3.0

0.2.3

0.2.2

0.2.1

0.2.0

0.1.1

0.1.0