automatic grading of jupyter notebooks

jupyter, teaching, unit test
pip install jupyter-autograde==0.1.4



autograde is a tool that lets you run tests on Jupyter notebooks in an isolated environment and creates both, human and machine readable reports.


Before installing autograde, ensure docker or podman is installed on your system.

Now, in order to install autograde, run pip install jupyter-autograde. Alternatively, you can install autograde from source by cloning this repository and runing pip install -e . within it (if your're developing autograde, run pip install -e .[develop] instead).

Eventually build the respective container image: python -m autograde build


When installing autograde via PyPI, docker support is not yet implemented. If you want to use docker, clone the directory and install the package from source.


apply tests

autograde comes with some example files located in the demo/ subdirectory that we will use for now to illustrate the workflow. Run:

python -m autograde test demo/test.py demo/notebook.ipynb --target /tmp --context demo/context

What happened? Let's first have a look at the arguments of autograde:

  • demo/test.py contains the a script with test cases we want apply
  • demo/notebook.ipynb is the a notebook to be tested (here you may also specify a directory to be recursively searched for notebooks)
  • The optional flag --target tells autograde where to store results, /tmp in our case and the current working directory by default.
  • The optional flag --context specifies a directory that is mounted into the sandbox and may arbitrary files or subdirectories. This is useful when the notebook expects some external files to be present.

The output is a compressed archive that is named something like results_[Lastname1,Lastname2,...]_XXXXXXXX.tar.xz and which has the following contents:

  • artifacts.tar.xz: all files that where created by or visible to the tested notebook
  • code.py: code extracted from the notebook including stdout/stderr as comments
  • notebook.ipynb: an identical copy of the tested notebook
  • test_results.csv: test results
  • test_restults.json: test results, enriched with participant credentials and a summary
  • report.rst: human readable report

summarize results

In a typical scenario, test cases are not just applied to one notebook but many at a time. Therefore, autograde comes with a summary feature, that aggregates results, shows you a score distribution and has some very basic fraud detection. To create a summary, simply run:

python -m autograde summary path/to/results

Three new files will appear in the result directory:

  • summary.csv: aggregated results
  • score_distribution.pdf: a score distribution (without duplicates)
  • similarities.pdf: similarity heatmap of all notebooks