markovtextgen

Simple module for text generation with a Markov model


Keywords
ngram, markov, generator, nlp
License
MIT
Install
pip install markovtextgen==0.4.2

Documentation

Markov Text

Simple module for text generation with a Markov model. Tokenises input text by words but not sentences, including transitions between sentences in analysis and generation. Whether this results in better 'flow' of text remains to be seen, but it does allow for the inclusion of sentences shorter than the n-gram depth.

Installation

Note: this module is only compatible with Python 3.

pip3 install markovtextgen

Usage

    import markovtextgen as markov

    # get tokenised list of words from txt file
    words = markov.get_word_list('./text.txt')
    # get dictionary of ngram frequencies
    counts = markov.get_ngram_counts(words)
    # use ngram frequencies to generate at least 20 words of text
    text = markov.generate_text(counts, 20)

    print(text)
    # >> Having supposed that his business is to be so, do you any conversation with Miss Pross, the Doctor, in a very slight one, forced upon the trees.

Functions

get_word_list(file_path, transform_function[optional])

Tokenises input text, returning a list of single words.

  • directory_path: Path to the input text.
  • transform_function: Function to be applied to each line before it is processed. Should take a single string as a parameter and return the transformed string.

get_ngram_counts(tokens, ngram_depth=3)

Returns a dictionary containing ngrams and their frequencies.

  • tokens: List of ordered tokens.
  • ngram_depth: Integer representing ngram depth. Defaults to 3.

generate_text(counts, min_words, end_characters=['.', '?', '!'])

Returns generated text as a string.

  • counts: Dictionary of ngram counts.
  • min_words: Integer. Minimum number of words to generate. May return less if size of input text is small.
  • end_characters: List of characters on which text generation will end, after min_words has been exceeded. Defaults to ['.', '?', '!'].