Markov Text
Simple module for text generation with a Markov model. Tokenises input text by words but not sentences, including transitions between sentences in analysis and generation. Whether this results in better 'flow' of text remains to be seen, but it does allow for the inclusion of sentences shorter than the n-gram depth.
Installation
Note: this module is only compatible with Python 3.
pip3 install markovtextgen
Usage
import markovtextgen as markov
# get tokenised list of words from txt file
words = markov.get_word_list('./text.txt')
# get dictionary of ngram frequencies
counts = markov.get_ngram_counts(words)
# use ngram frequencies to generate at least 20 words of text
text = markov.generate_text(counts, 20)
print(text)
# >> Having supposed that his business is to be so, do you any conversation with Miss Pross, the Doctor, in a very slight one, forced upon the trees.
Functions
get_word_list(file_path, transform_function[optional])
Tokenises input text, returning a list of single words.
- directory_path: Path to the input text.
- transform_function: Function to be applied to each line before it is processed. Should take a single string as a parameter and return the transformed string.
get_ngram_counts(tokens, ngram_depth=3)
Returns a dictionary containing ngrams and their frequencies.
- tokens: List of ordered tokens.
- ngram_depth: Integer representing ngram depth. Defaults to 3.
generate_text(counts, min_words, end_characters=['.', '?', '!'])
Returns generated text as a string.
- counts: Dictionary of ngram counts.
- min_words: Integer. Minimum number of words to generate. May return less if size of input text is small.
- end_characters: List of characters on which text generation will end, after min_words has been exceeded. Defaults to ['.', '?', '!'].