nflsim

A tool for simulating the NFL regular season and playoffs


Keywords
NFL, football, sports, simulation, statistics
License
MIT
Install
pip install nflsim==1.1.8

Documentation

nflsim

This package simulates the NFL regular season and playoffs using a simple, customizable Monte Carlo method.

Installation

The package is on PyPI and can be installed with pip:

pip install nflsim

How it works

During each simulation, nflsim uses the methods described below to assign a winner to all remaining NFL games in a given season. It then uses the NFL's complex tiebreaking procedures to determine playoff seeding, and the playoffs are simulated game-by-game.

Before beginning the simulations, each team is assigned a power rating (PWR) with mean 0, such that a team with a PWR of 3 would be favored by 5 points vs a team with a PWR of -2 on a neutral field. By default, the base power rankings for each team are calculated using an equally-weighted combination of normalized versions of the SRS, FPI, DVOA, and Sagarin rankings. The rankings systems used and their relative weights are configurable, and custom ranking systems are supported. The individual rating systems and the combined rankings can be regressed to the mean (or to custom team-specific values) as desired.

The team PWR rankings are adjusted at the beginning of each season simulation by a random amount, determined using a normal distribution with mean 0 and a user-provided standard deviation (2 points by default):

adjusted_pwr = [PWR] - numpy.random.normal(0, [rank_adj])

This adjustment represents the uncertainty in each team's base PWR projection, which includes both model error and injury risk. Higher values equate to more variance in outcomes.

When simulating a game, the home team's PWR is adjusted upwards by a fixed amount and compared to the away team's PWR. The resulting point differential is used to generate a normal cumulative distribution function, which determines the home team's probability of winning the game. This win probability is compared to a random number to determine the simulated winner of the game:

home_pwr_difference = ([Home PWR] + [Home Adj]) - [Away PWR]
home_win_probability = 1 - scipy.stats.norm(home_pwr_difference, [stdev]).cdf(0)
is_home_winner = numpy.random.random() < home_win_probability

Both the home adjustment (3 points by default) and the standard deviation used to generate the normal distribution (13 points by default) are configurable.

Usage

Basics

Each simulation is controlled by a Simulate object. You create an object by specifying the season to simulate and the number of simulations:

import nflsim as nfl
simulation = nfl.Simulate(season=2018, n_sims=10000)

If desired, you can customize the values for home-field advantage, the PWR rank adjustment used at the beginning of each simulation, and the standard deviation used when simulating individual games:

simulation = nfl.Simulate(season=2018, n_sims=10000, rank_adj=3, home_adj=2.5, st_dev=13.5)
PWRsystems

You can customize how the power rankings are generated by creating a PWRsystems object. You create an object by indicating which systems to include:

systems = nfl.PWRsystems(dvoa=True, fpi=True, sagarin=True)
simulation = nfl.Simulate(season=2018, n_sims=10000, pwr_systems=systems)

The weights for each system (default = 1) can be specified using the built-in objects for each system (SRS, DVOA, FPI, and Sagarin):

systems = nfl.PWRsystems(srs=True, dvoa=nfl.DVOA(weight=2), fpi=nfl.FPI(weight=1.5))

You can also incorporate your own rating system by creating a generic PWR object and passing it a pandas DataFrame containing the custom rankings. The DataFrame must include one column called 'Team' containing the full team names and another column containing the team rankings. The name of the ranking column should be unique from those of the other systems being used (so don't use "FPI" or "SRS"):

my_sys_df = pandas.DataFrame([{'Team':'A','Power':-2},{'Team':'B','Power':5}])
my_sys = nfl.PWR(weight=2, values=my_sys_df)
systems = nfl.PWRsystems(srs=True, others=my_sys)

To use multiple custom systems, pass a list of DataFrames instead of a single DataFrame:

df1 = pandas.DataFrame([{'Team':'A','Power':-2},{'Team':'B','Power':5}])
df2 = pandas.DataFrame([{'Team':'A','Power':0},{'Team':'B','Power':2}])
my_sys_1 = nfl.PWR(weight=2, values=df1)
my_sys_2 = nfl.PWR(weight=1.5, values=df2)
systems = nfl.PWRsystems(srs=True, others=[my_sys_1, my_sys_2])
Regression

Optionally, you can choose to regress the ratings generated by each system by creating a Regression object (if regress_to is omitted, no regression will be used). By default, PWR values will be regressed to the sample mean:

my_sys = nfl.SRS(weight=2, regress_to=nfl.Regression())

You can use fixed weighting by specifying a decimal between 0 and 1, or variable weighting based on the percentage of a specified number of games played (the default option):

#(PWR * 0.75) + (sample_mean * 0.25)
regression_fixed = nfl.Regression(weight=0.25)
#((PWR * games_played) + (sample_mean * max(0, 10 - games_played))) / max(10, games_played)
regression_variable = nfl.Regression(n_games=10)

You can regress PWR to a fixed value rather than using the sample mean:

regression = nfl.Regression(to=0, weight=0.5)

You can also specify a custom regression value for each team using a pandas DataFrame. The DataFrame must contain one column called 'Team' containing the full team names and another called 'Baseline' for the regression values:

df = pd.DataFrame([{'Team':'A','Baseline':-2},{'Team':'B','Baseline':5}])
regression = nfl.Regression(to=df, n_games=16)

In addition to (or instead of) regressing the values for individual PWR systems, you can choose to regress the final results after combining the various systems:

regression = nfl.Regression(n_games=10)
systems = nfl.PWRsystems(regress_to=regression, srs=True, dvoa=nfl.DVOA(weight=2))
Execution and Analysis

Once you've set up your Simulate object, use run() to execute the simulation.

regression = nfl.Regression(n_games=10)
systems = nfl.PWRsystems(srs=nfl.SRS(regress_to=regression), fpi=True, dvoa=nfl.DVOA(weight=2))
simulation = nfl.Simulate(season=2018, n_sims=10000, pwr_systems=systems)
simulation.run()

The run() method will return a reference to the Simulate object, so this syntax is also acceptable:

simulation = nfl.Simulate(season=2018, n_sims=10000, pwr_systems=systems).run()

By default, run() will use the joblib package to run the simulations in parallel; this can be overridden by setting parallel=False:

simulation = nfl.Simulate(season=2018, n_sims=100).run(parallel=False)

Once the simulation has executed, the results are aggregated and stored in several related dataframes. These can either be directly accessed using the simulations property:

standings = sim.simulations.standings
regularseason = sim.simulations.regularseason
seeding = sim.simulations.seeding
playoffs = sim.simulations.playoffs

Or returned as copies using class methods:

standings = sim.standings()
regularseason = sim.regularseason()
seeding = sim.seeding()
playoffs = sim.playoffs()

By default, all of the aggregated dataframes use MultiIndexes incorporating the simulation number and the within-simulation row number. The class methods include an option to extract the "Simulation" portion of the MultiIndex into its own column:

standings_reindexed = sim.standings(reindex=True)

You can also entirely disable the generation of aggregated statistics, in which case the results are stored as a list of Simulation objects:

sim = nfl.Simulate(season=2018, n_sims=100000).run(combine=False)
for simulation in sim.simulations.values:
    rankings = simulation.rankings
    standings = simulation.standings
    regularseason = simulation.regularseason
    seeding = simulation.seeding
    playoffs = simulation.playoffs