talkgenerator

Automatically generating presentation slide decks based on a given topic for improvised presentations


Keywords
art, evolutionary-computation, funny-experiments, python, python3
License
MIT
Install
pip install talkgenerator==2.1.4

Documentation

Talk Powerpoint Generator

CircleCI codecov License

Description

Software to automatically generate talks, presentations for PowerPoint and/or Keynote. Their main purpose is for the improvisational comedy format "Improvised TED talk", where the actors have to present an unseen presentation. This software can be extended to be used for any sort of presentation including for example pecha kucha, etc.

Demo

For a demo of this generator, please visit the online demo page, a platform created by Shaun Furragia to give easier access to this talk generator.

Example

Automatically Generated]

Installation Instructions

# Run the setup script from the command line
source setup.sh

Setting up required authentication

Our program relies on certain APIs that require authentication in order to use it. Create a file named .env (don't forget the period) in your project directory.

# Reddit Authentication
REDDIT_CLIENT_ID=
REDDIT_CLIENT_SECRET=
REDDIT_USER_AGENT="" #use quotes here

# Wikihow Authentication
WIKIHOW_USERNAME=
WIKIHOW_PASSWORD=

# OPTIONAL: If you want to save to Amazon S3, define your params here
AWS_TALK_BUCKET_KEY=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=

Reddit authentication: Getting your keys

Get your Reddit authentication keys by following these steps.

The REDDIT_USERAGENT can be set to "python:https://github.com/korymath/talk-generator:v0.0.1 by /u/REDDIT_USERNAME)" and replace the REDDIT_USERNAME with your Reddit username.

Wikihow authentication: Getting your keys

You can create this file by following the next steps:

  • Create a Wikihow account.
  • Open .env
  • Fill in WIKIHOW_USERNAME with your username, and WIKIHOW_PASSWORD with your password.

Installing nltk

We require several NTLK packages, which can be downloaded by running the following code in to the Python console:

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

Common Errors/Warnings:

prawcore.exceptions.ResponseException: received 401 HTTP response

From the Reddit API documentation, it sounds like the 401 error is given when your client id/secret are incorrect. Are you using the correct values from the app page?

BeautifulSoup lxml warning

  • Add parser for BeautifulSoup
sublime venv/lib/python3.6/site-packages/PyDictionary/utils.py
# change:  return BeautifulSoup(requests.get(url).text)
# to: 	   return BeautifulSoup(requests.get(url).text, 'lxml')

Windows lxml warning

pip might complain when installing the python-pptx dependency due to a missing the lxml dependency. If this is not resolved automatically, visit this page. On that page, select the right lxml version for your platform and Python version (e.g. cp37 = Python 3.7).

In case installing the dependencies complain about Visual C++ version while resolving the python-pptx dependency, consider installing a version of Visual Studio.

Missing mysql.h

If pip complains about a missing mysql.h, you need to pip install wheel, go to mysql wheel download to download the wheel and run pip install mysqlclient-1.3.8-cp36-cp36m-win_amd64.whl

Docker Instructions (optional)

slaps the hood of the container Yep this bad boy runs on Docker.

Building the Image

Build the image, and tag it as talkgen.

docker build -t talkgen .

Running the Image

Run the image tagged as talkgen. The container /output directory maps to your current working directory.

docker run --env-file .env -v ``pwd``/output:/output talkgen run.py --open_ppt false

Reasonable defaults have been provided. To override, simply pass the command-line parameter. Here we are overriding the the topic and number of slides.

docker run --env-file .env -v ``pwd``/output:/output talkgen run.py --topic 'climate change' --num_slides 12 --open_ppt false

  • be sure that open_ppt is false when running as a docker process.

Running the generator

python run.py --topic cat --num_slides 10

Available arguments

Argument Description
topic The topic of the generator. This works best if it is a common, well-known noun
num_slides The number of slides in the generated presentation (default: 10)
schema The presentation schema to use when generating the presentation. Currently, only two modes are implemented, being default and test (for testing during development)
presenter The name that will be present on the first slide. Leave blank for an automatically generated name
output_folder The folder to output the generated presentations (default: ./output/)
save_ppt If this flag is true(default), the generated powerpoint will be saved on the computer in the output_folder
open_ppt If this flag is true (default), the generated powerpoint will automatically open after generating

Running the generator as a microservice

Run the generator as a microservice at 0.0.0.0:5687.

sh python run_web.py

You can then hit http://0.0.0.0:5687?topic=sometopic. This will kick the main.py off.

Program structure

In this section, we discuss the many parts of this software.

Powerpoint Template

  • data/powerpoint/template.pptx: This Powerpoint file contains the powerpoint presentation to start from. The interesting part of this file is when opening the model view, as you can edit the slide templates and their placeholders.

  • slide_templates.py: This Python module is responsible for filling in the template.pptx with values. There are also functions present which you can give as arguments functions that generates content when given a presentation_context dictionary. It will then generate the content, and if certain conditions (e.g. originality) are satisfied, it will create and add a slide to the presentation.

Presentation Schema elements

  • PresentationSchema: This class controls the parameters of the powerpoint generator. It contains information about which slide generators to use, which slide topic generator and a dictionary max_allowed_tags that contains information about how many times slide generators with certain tags are allowed to generate in one presentation. We might add different presentation schemas for different types of presentation generators in the future. The PresentationSchema class can be found in presentation_schema.py.

  • SlideGenerator: This class contains information about how to generate a particular type of slide. It holds a generating function for this purpose, as well as meta-data. For example, it contains a weight_function to calculate the chance of being used for a certain slide number, a name and tags for the generator, the allowed number of elements that have already been used in the presentation and the number of retries the slide generator is allowed to make in case it fails to generate a slide. The SlideGenerator class can be found in presentation_schema.py.

  • SlideTopicGenerator: This is type of class that has a generate_seed(slide_nr) function, which generates a seed for the given slide number, which tends to be based on the topic of the presentation (as entered by the user). This slide seed will then be given to a Slide Generator as an inspiration point for the content of the slide. There are several types of topic generators, such as a slide topic generator that just returns the presentation topic, one that gives synonyms and one that makes little side tracks using related concepts on ConceptNet

  • presentation_context: This is an object that is created by the Presentation Schema, containing information about the topic and presenter of the presentation, the used content and the seed for the current slide

Custom Text Template Language: text_generator

We wrote our own custom templated text generation language to easily generate texts. They're mostly based on Python's str.format and Tracery, but come with some extra functionalities (see also language_util)

The template files themselves are stored in /data/text-templates/*

TemplatedTextGenerator

On construction, this object is given the name of a file that contains a text template on a new line, usually in a .txt file. Similar to the build-in str.format function, these text templates can contain named variables between curly brackets {variable_name}. Usually, the presentation_context dictionary is used as an argument. This means that in these text generators {seed} will be the variable containing the slide topic seed. This dictionary can also be extended before generating the text, such that more, custom variables are also possible.

A difference is that our custom language also provides some functions that can be easily called within the template. If a function returns None, or the variable is not present in the given dictionary, the text generator will keep retrying to generate until no templates are left. An example of such a function is {seed.plural.title}, which will pluralise the seed, and then apply title casing.

Listed below are some possible functions in our text generation language. The up-to-date list of function can be found in text_generator.py.

Function Description
title Converts the string to title casing
lower Converts the string to lower casing
upper Converts the string to upper casing
dashes Replaces the spaces of the string with dashes
a Adds the "a" article, or "an" if the word starts with a vowel (except u)
ing Converts a verb to the present participle
plural Converts a noun to plural
singular Converts a noun to singular
synonym Converts the noun to a random synonym
2_to_1_pronouns Changes 2nd person pronouns to 1st person pronouns in a sentence (e.g. you->my)
wikihow_action Retrieves a random action based on the string using Wikihow
get_last_noun_and_article Extracts the last noun from a sentence
conceptnet_location Retrieves a location related to the string using Conceptnet
is_noun Checks if the string can be a noun
is_verb Checks if the string can be a verb

TraceryTextGenerator

Allows the same things TemplatedTextGenerator does, but using a Tracery grammar. This means that the file is saved as a JSON file, and that local variables can be declared, for easily creating a large possibility space of possible texts.

Utilities

  • random_util: This module helps with dealing better with randomness. It has a function to deal with picking from a list with weighted elements, as well as choice_optional(list), which is like random.choice, except it returns None if the list is empty

  • generator_util: This module provides lots of utilities for creating generators. Since some content generators return (image or textual) lists, there are functions for converting them to normal single output generators. There are also methods for converting methods that only take a string seed as input to one that takes a presentation_context, namely create_seeded_generator(generator). There are also more exotic generators such as weighted generators and walking generators.

  • language_util: Contains many language functionalities, such as converting to singular/plural, changing tense, checking part of speech, getting synonyms etc.

  • scraper_util: Provides some common functionalities for the page scrapers.

  • os_util: Contains some methods dealing with the operating system, such as saving and checking files.

  • cache_util: Contains a hashable dictionary class, which is necessary for caching certain functions.

Content generators

There are a lot of different services providing content to our generator. Usually, the content scrapers below are used in run.py to craft a real concent generator used in the slides generators.

  • chart.py: Generates random powerpoint charts using text templates and random math functions.
  • conceptnet.py: Explores the graph of related concepts to certain seeds
  • goodreads.py: Used for retrieving quotes related to a seed
  • google_images.py: Used for retrieving relevant images for certain seeds (e.g. as background image)
  • inspirobot.py: Used for retrieving nonsensical quote images
  • reddit.py: Used for scraping reddit images, as there are many interesting subreddits to scrape images from.
  • shitpostbot.py: Used for retrieving "interesting"/weird images
  • wikihow.py: Used for finding related actions to a certain seed.

Prohibited images

Sometimes, certain content providers return a default image when no image is found for that url (usually when an image got deleted). These types of images are stored in our repository in data/images/prohibited/*. This folder gets automatically scanned, and all images in the generated presentation are compared to images from this folder, to ensure that none gets added to the final presentation.

Tests

There are a lot of tests present in this repository. These .py files are prefixed with test_, and use the unittest module. They can easily be run all together when using PyCharm by right clicking on talk-generator and pressing Run 'Unittests in talk-generator'

source setup.sh
pytest

Test coverage is automatically handled by codecov.

Tests are automatically run with CircleCI based on the .yml file in the .circleci directory.

Credits

This Talk Generator is made by Kory Mathewson and Thomas Winters, with help from Shaun Farrugia, Piotr Mirowski and Julian Faid.

License

MIT License. Copyright (c) 2018 Kory Mathewson and Thomas Winters.