correcthorse

Secure but memorable passphrase generator


Keywords
password, passphrase
License
Apache-2.0
Install
pip install correcthorse==0.2.0

Documentation

Correct Horse: A memorable passphrase generator

Correct Horse is a tool for generating resonably secure but reasonably memorable passphrases. It does this by picking a set of words from a dictionary and then sorting them into an order that plausibly makes sense as a phrase. It can be used either as a command line tool to generate and print a passphrase that one might enter into another application or through an API to pick new passphrases for something like a rotating code on a WiFi network.

Usage

Command line

correcthorse [-h] [-n MAX_WORDS] [-N MAX_LETTERS] [-l] [-c] [-u] [-s] [-j] [-H]
                   [-S SEPARATOR] [-f FILENAME] [-L LOCALE]

Use the -n WORD_COUNT or --max_words WORD_COUNT option to set the maximum number of words in the passphrase. Use the -N LETTER_COUNT or --max_letters LETTER_COUNT option to set the maximum number of letters in the passphrase. You may set both

The -l or --lower-case will cause all of the words in the passphrase to be printed in lower case while the -c or --capitalise flag will cause the first letter of each word to be capitalised. The default is to capitalise the first letter.

When the words of the passphrase are printed a seperator string will be placed between then. This can be set using the -S SEPARATOR or --separator SEPARATOR option. For convenience you can use -u or --underscore to join the words with an underscore, -s or --space to join the words with a space, -H or --hyphen to join the words with a hyphen or -j or --join to simply run the words together without a separator.

By default correcthorse will detect your locale and pick a word set that matches your local language. You can override the detected locale using the -L LOCALE or --locale LOCALE options. While great in theory, unless your language is English the chances are that currently there is no built in word file for your language. You can however specify a custom word file using the -f FILENAMEor --word-file FILENAME options.

Use as a python module

To use the correcthorse module through the API create an instance of the WordSet using WordSet(filename=None, locale=None, encoding='UTF-8') and then call random_phrase(max_words=4).

Dictionary and grammar file format

It is possible to provide correcthorse with new lists from which to draw the words of the passphrase. The strucutre of this is pretty simple; it is a sequence of blocks of text separated by one or more blank lines, with optional metadata lines starting with #* follow by a JSON object and optional (ignored) comment lines which start with just a # character.

The key to the functioning of correcthorse is that once a set of words is drawn (uniformly, within the contstraints in the metadata) from the whole list of words, the words are then sorted such that words drawn from earlier text blocks will occur earlier in the resulting list and words drawn from later block will be presented later. This makes use of the fact that in English there is an implicit order in which adjectives are used as well as to where adjectives sit relative to nouns. Thus one would normally say lovely little green plant rather than green little lovely plant (let alone lovely plant green little). By observing this ordering the phrases generated by correcthorse tend to be more memorable than if the words were in a random order. The sorting of the words looses a little bit (strictly speaking O(n.log(n)) bits) of the entropy in the passphrase but this is compensated for by using a fairly large word list.

The metadata for each block of words is provided as a JSON dictionary. The program will function correctly without any metadata being provided but the words may read more pleasingly is it is. The following keys are recognised:

  • name: A string to use as the name for the group of words.
  • exclude: A space-spararted list of names of other groups from which words should not be drawn if a word from this group is selected.
  • max: An integer indicating the maximum number of words to draw from this group.
  • min: An integer indicating the minimum number of words to draw from this group.
  • plural: A boolean indicating if this adjective implies that the subject nouns should be plural. If this key is present and set to false then it indicates that the subject should be singluar. If no adjective with this key present is chosen then the plurality will be picked at random.
  • pluralise: A boolean indicating that the program should attempt to make words in this group singular or plural according to the plurality of the other words.

For words in groups with the pluralise key set true in their metadata, a locale-specific set of rules will be applied to try to create a plural form where necessary. Unfortunately many languages have many irregular words and the rules don't always work. In order to cope with this, irregular words can be added to the list in the form stem|singular-suffix|plural-suffix. Thus the word list for English includes m|ouse|ice because the word mouse changes completely when plural, piano||s, because the plural form of piano does not obey the usual rules for plurals ending in a vowel and sheep|| because the plural of sheep is sheep.

A warning about offensive phrases

Some care has been taken to try to remove words that might result in offensive phrases, in particular by removing words that are generally pejorative and words that are specific to groups of people. That said, there is still a small chance that some of the phrases may be onsidered offensive by some people, or may be thought to be risqué as a result of some double entendre. Without having access to a fully sentient AI system it's not easy filter these out but hopefully they will be rare.

License

Correct Horse is released under the Apache License, Version 2.0.

What's with the weird name?

The name Correct Horse comes from the excellent XKCD cartoon by Randall Munroe about generating good passwords.