davos

Install and manage Python packages at runtime using the "smuggle" statement.


Keywords
import, install, package, module, automatic, davos, smuggle, pip, conda, environment-management, google-colab, ipython, jupyter, package-management, python, reproducibility
License
MIT
Install
pip install davos==0.2.3

Documentation

Someone once told me that the night is dark and full of terrors. And tonight I am no knight. Tonight I am Davos the smuggler again. Would that you were an onion.

Introduction

The davos library provides Python with an additional keyword: smuggle.

The smuggle statement works just like the built-in import statement, with two major differences:

  1. You can smuggle a package without installing it first
  2. You can smuggle a specific version of a package

Taken together, these two enhancements to import provide a powerful system for developing and sharing reproducible code that works across different users and environments.

Table of contents

Why would I want an alternative to import?

In many cases, smuggle and import do the same thing—if you're running code in the same environment you developed it in. But what if you want to share a Jupyter notebook containing your code with someone else? If the user (i.e., the "someone else" in this example) doesn't have all of the packages your notebook imports, Python will raise an exception and the code won't run. It's not a huge deal, of course, but it's inconvenient (e.g., the user might need to pip-install the missing packages, restart their kernel, re-run the code up to the point it crashed, etc.—possibly going through this cycle multiple times until the thing finally runs).

A second (and more subtle) issue arises when the developer (i.e., the person who wrote the code) used or assumed different versions of the imported packages than what the user has installed in their environment. So maybe the original author was developing and testing their code using pandas 1.3.5, but the user hasn't upgraded their pandas installation since 0.25.0. Python will happily "import pandas" in both cases, but any changes across those versions might change what the developer's code actually does in the user's (different) environment—or cause it to fail altogether.

The problem davos tries to solve is similar to the idea motivating virtual environments, containers, and virtual machines: we want a way of replicating the original developer's environment on the user's machine, to a sufficiently good approximation that we can be "reasonably confident" that the code will continue to behave as expected.

When you smuggle packages instead of importing them, it guarantees (for whatever environment the code is running in) that the packages are importable, even if they hadn't been installed previously. Under the hood, davos figures out whether the package is available, and if not, it uses pip to download and install anything that's missing (including missing dependencies). From that point, after having automatically handled those sorts of dependency issues, smuggle behaves just like import.

The second powerful feature of davos comes from another construct, called "onion comments." These are like standard Python comments, but they appear on the same line(s) as smuggle statements, and they are formatted in a particular way. Onion comments provide a way of precisely controlling how, when, and where packages are installed, how (or if) the system checks for existing installations, and so on. A key feature is the ability to specify exactly which version(s) of each package are imported into the current workspace. When used in this way, davos enables authors to guarantee that the same versions of the packages they developed their code with will also be imported into the user's workspace at the appropriate times.

Why not use virtual environments, containers, and/or virtual machines instead?

Psst-- we'll let you in on a little secret: importing davos automatically creates a virtual environment for your notebook. However, whereas setting up a virtual environment is usually left to the user, davos handles the pesky details for you, without you needing to think about them. Any packages you smuggle via davos that aren't available in the notebook's original runtime environment are installed into a new virtual environment. This ensures that davos will not change the runtime environment (e.g., by installing new packages, changing existing package versions, etc.).

By default, each notebook's virtual environment is stored in a hidden ".davos" folder inside the current user's home directory. The default environment name is computed to uniquely identify each notebook, according to its filename and path. However, a notebook's virtual environment may be customized by setting davos.project to any string that can be used as a valid folder name in the user's operating system. This is useful for multi-notebook projects that share dependencies (without needing to duplicate each package installation for each notebook).

If you prefer, you can also disable davos's virtual environment infrastructure by setting davos.project to None. Doing so will cause any packages installed by davos to affect the notebook's runtime environment. This is generally not recommended, as it can lead to unintended consequences for other code that shares the runtime environment. That said, davos also works great when used inside of (standard) virtual environments, containers, and virtual machines.

There are a few additional specific advantages to davos that go beyond more typical virtual environments, containers, and/or virtual machines. The main advantage is that davos is very lightweight: importing davos into a notebook-based environment unlocks all of its functionality without needed to install, set up, and learn how to use additional stuff. There is none of the typical overhead of setting up a new virtual environment (or container, virtual machine, etc.), installing third-party tools, writing and sharing configuration files, and so on. All of your code and its dependencies may be contained in a single notebook file.

Okay... so how do I use this thing?

To turn a standard Jupyter (IPython) notebook, including a Google Colaboratory notebook, into a davos-enhanced notebook, just add two lines to the first cell:

%pip install davos
import davos

This will enable the smuggle keyword in your notebook environment. Then you can do things like:

# pip-install numpy v1.23.1, if needed
smuggle numpy as np    # pip: numpy==1.23.1

# the smuggled package is fully imported and usable
arr = np.arange(15).reshape(3, 5)

# and the onion comment guarantees the desired version!
assert np.__version__ == '1.23.1'

Interested? Curious? Intrigued? Check out the table of contents for more details! You may also want to check out our paper for more formal descriptions and explanations.

Installation

Latest Stable PyPI Release

pip install davos

Latest GitHub Update

pip install git+https://github.com/ContextLab/davos.git

Installing in Colaboratory

To install davos in Google Colab, add a new cell to the top of your notebook with an percentage sign (%) followed by one of the commands above (e.g., %pip install davos). You'll likely also want to import davos, which enables the smuggle syntax. Run the cell to install davos on the runtime virtual machine.

Note: restarting the Colab runtime does not affect installed packages. However, if the runtime is "factory reset" or disconnected due to reaching its idle timeout limit, you'll need to rerun the cell to reinstall davos on the fresh VM instance.

Overview

The primary way to use davos is via the smuggle statement, which is made available simply by running import davos. Like the built-in import statement, the smuggle statement is used to load packages, modules, and other objects into the current namespace. The main difference between the two is in how they handle missing packages and specific package versions.

Smuggling Missing Packages

import requires that packages be installed before the start of the interpreter session. Trying to import a package that can't be found locally will throw a ModuleNotFoundError, and you'll have to install the package from the command line, restart the Python interpreter to make the new package importable, and rerun your code in full in order to use it.

The smuggle statement, however, can handle missing packages on the fly. If you smuggle a package that isn't installed locally, davos will install it for you, make its contents available to Python's import machinery, and load it into the namespace for immediate use. You can control how davos installs missing packages by adding a special type of inline comment called an "onion" comment next to a smuggle statement.

Smuggling Specific Package Versions

One simple but powerful use for onion comments is making smuggle statements version-sensitive.

Python doesn't provide a native, viable way to ensure a third-party package imported at runtime matches a specific version or satisfies a particular version constraint. Many packages expose their version info via a top-level __version__ attribute (see PEP 396), and certain tools (such as the standard library's importlib.metadata and setuptools's pkg_resources) attempt to parse version info from installed distributions. However, using these to constrain imported package would require writing extra code to compare version strings and still manually installing the desired version and restarting the interpreter any time an invalid version is caught.

Additionally, for packages installed through a version control system (e.g., git), this would be insensitive to differences between revisions (e.g., commits) within the same semantic version.

davos solves these issues by allowing you to specify a specific version or set of acceptable versions for each smuggled package. To do this, simply provide a version specifier in an onion comment next to the smuggle statement:

smuggle numpy as np              # pip: numpy==1.23.1
from pandas smuggle DataFrame    # pip: pandas>=1.0,<2.0

In this example, the first line will load numpy into the local namespace under the alias "np", just as "import numpy as np" would. First, davos will check whether numpy is installed locally, and if so, whether the installed version exactly matches 1.23.1. If numpy is not installed, or the installed version is anything other than 1.23.1, davos will use the specified installer program, pip, to install numpy==1.23.1 before loading the package.

Similarly, the second line will load the "DataFrame" object from the pandas library, analogously to "from pandas import DataFrame". A local pandas version of 1.2.1 would be used, but a local version of 2.1.1 would cause davos to replace it with a valid pandas version, as if you had manually run pip install pandas>=1.0,<2.0.

In both cases, the imported versions will fit the constraints specified in their onion comments, and the next time numpy or pandas is smuggled with the same constraints, valid local installations will be found.

You can also force the state of a smuggled packages to match a specific VCS ref (branch, revision, tag, release, etc.). For example:

smuggle hypertools as hyp    # pip: git+https://github.com/ContextLab/hypertools.git@98a3d80

will load hypertools (aliased as "hyp"), as the package existed on GitHub, at commit 98a3d80. The general format for VCS references in onion comments follows that of the pip-install command. See the notes on smuggling from VCS below for additional info.

And with a few exceptions, smuggling a specific package version will work even if the package has already been imported!

Note: davos v0.2.x supports IPython environments (e.g., Jupyter and Colaboratory notebooks) only. v0.3.x will add support for "regular" (i.e., non-interactive) Python scripts.

Use Cases

Simplify sharing reproducible code & Python environments

Different versions of the same package can often behave quite differently—bugs are introduced and fixed, features are implemented and removed, support for Python versions is added and dropped, etc. Because of this, Python code that is meant to be reproducible (e.g., tutorials, demos, data analyses) is commonly shared alongside a set of fixed versions for each package used. And since there is no Python-native way to specify package versions at runtime (see above), this typically takes the form of a pre-configured development environment the end user must build themselves (e.g., a Docker container or conda environment), which can be cumbersome, slow to set up, resource-intensive, and confusing for newer users, as well as require shipping both additional specification files and setup instructions along with your code. And even then, a well-intentioned user may alter the environment in a way that affects your carefully curated set of pinned packages (such as installing additional packages that trigger dependency updates).

Instead, davos allows you to share code with one simple instruction: just pip install davos! Replace your import statements with smuggle statements, pin package versions in onion comments, and let davos take care of the rest. Beyond its simplicity, this approach ensures your predetermined package versions are in place every time your code is run.

Guarantee your code always uses the latest version, release, or revision

If you want to make sure you're always using the most recent release of a certain package, davos makes doing so easy:

smuggle mypkg    # pip: mypkg --upgrade

Or if you have an automation designed to test your most recent commit on GitHub:

smuggle mypkg    # pip: git+https://username/reponame.git

Compare behavior across package versions

The ability to smuggle a specific package version even after a different version has been imported makes davos a useful tool for comparing behavior across multiple versions of the same package, within the same interpreter session:

def test_my_func_unchanged():
    """Regression test for `mypkg.my_func()`"""
    data = list(range(10))

    smuggle mypkg                    # pip: mypkg==0.1
    result1 = mypkg.my_func(data)

    smuggle mypkg                    # pip: mypkg==0.2
    result2 = mypkg.my_func(data)

    smuggle mypkg                    # pip: git+https://github.com/MyOrg/mypkg.git
    result3 = mypkg.my_func(data)

    assert result1 == result2 == result3

Usage

The smuggle Statement

Syntax

The smuggle statement is meant to be used in place of the built-in import statement and shares its full syntactic definition:

smuggle_stmt    ::=  "smuggle" module ["as" identifier] ("," module ["as" identifier])*
                     | "from" relative_module "smuggle" identifier ["as" identifier]
                     ("," identifier ["as" identifier])*
                     | "from" relative_module "smuggle" "(" identifier ["as" identifier]
                     ("," identifier ["as" identifier])* [","] ")"
                     | "from" module "smuggle" "*"
module          ::=  (identifier ".")* identifier
relative_module ::=  "."* module | "."+
NB: uses the modified BNF grammar notation described in The Python Language Reference, here; see here for the lexical definition of identifier

In simpler terms, any valid syntax for import is also valid for smuggle.

Rules

  • Like import statements, smuggle statements are whitespace-insensitive, unless a lack of whitespace between two tokens would cause them to be interpreted as a different token:
    from os.path smuggle dirname, join as opj                       # valid
    from   os   . path   smuggle  dirname    ,join      as   opj    # also valid
    from os.path smuggle dirname, join asopj                        # invalid ("asopj" != "as opj")
  • Any context that would cause an import statement not to be executed will have the same effect on a smuggle statement:
    # smuggle matplotlib.pyplot as plt           # not executed
    print('smuggle matplotlib.pyplot as plt')    # not executed
    foo = """
    smuggle matplotlib.pyplot as plt"""          # not executed
  • Because the davos parser is less complex than the full Python parser, there are two fairly non-disruptive edge cases where an import statement would be syntactically valid but a smuggle statement would not:
    1. The exec function
      exec('from pathlib import Path')         # executed
      exec('from pathlib smuggle Path')        # raises SyntaxError
    2. A one-line compound statement clause:
      if True: import random                   # executed
      if True: smuggle random                  # raises SyntaxError
      
      while True: import math; break           # executed
      while True: smuggle math; break          # raises SyntaxError
      
      for _ in range(1): import json           # executed
      for _ in range(1): smuggle json          # raises SyntaxError
      
      # etc...
  • In IPython environments (e.g., Jupyter & Colaboratory notebooks) smuggle statements always load names into the global namespace:
    # example.ipynb
    import davos
    
    
    def import_example():
        import datetime
    
    
    def smuggle_example():
        smuggle datetime
    
    
    import_example()
    type(datetime)                               # raises NameError
    
    smuggle_example()
    type(datetime)                               # returns

The Onion Comment

An onion comment is a special type of inline comment placed on a line containing a smuggle statement. Onion comments can be used to control how davos:

  1. determines whether the smuggled package should be installed
  2. installs the smuggled package, if necessary

Onion comments are also useful when smuggling a package whose distribution name (i.e., the name used when installing it) is different from its top-level module name (i.e., the name used when importing it). Take for example:

from sklearn.decomposition smuggle pca    # pip: scikit-learn

The onion comment here (# pip: scikit-learn) tells davos that if "sklearn" does not exist locally, the "scikit-learn" package should be installed.

Syntax

Onion comments follow a simple but specific syntax, inspired in part by the type comment syntax introduced in PEP 484. The following is a loose (pseudo-)syntactic definition for an onion comment:

onion_comment   ::=  "#" installer ":" install_opt* pkg_spec install_opt*
installer       ::=  ("pip" | "conda")
pkg_spec        ::=  identifier [version_spec]
NB: uses the modified BNF grammar notation described in The Python Language Reference, here; see here for the lexical definition of identifier

where installer is the program used to install the package; install_opt is any option accepted by the installer's "install" command; and version_spec may be a version specifier defined by PEP 440 followed by a version string, or an alternative syntax valid for the given installer program. For example, pip uses specific syntaxes for local, editable, and VCS-based installation.

Less formally, an onion comment simply consists of two parts, separated by a colon:

  1. the name of the installer program (e.g., pip)
  2. arguments passed to the program's "install" command

Thus, you can essentially think of writing an onion comment as taking the full shell command you would run to install the package, and replacing "install" with ":". For instance, the command:

pip install -I --no-cache-dir numpy==1.23.1 -vvv --timeout 30

is easily translated into an onion comment as:

smuggle numpy    # pip: -I --no-cache-dir numpy==1.23.1 -vvv --timeout 30

In practice, onion comments are identified as matches for the regular expression:

#+ *(?:pip|conda) *: *[^#\n ].+?(?= +#| *\n| *$)
NB: support for installing smuggled packages via conda will be added in v0.2. For v0.1, "pip" should be used exclusively.

Note: support for installing smuggled packages via the conda package manager will be added in v0.2. For v0.1, onion comments should always specify "pip" as the installer program.

Rules

  • An onion comment must be placed on the same line as a smuggle statement; otherwise, it is not parsed:
    # assuming the dateutil package is not installed...
    
    # pip: python-dateutil                       # <-- has no effect
    smuggle dateutil                             # raises InstallerError (no "dateutil" package exists)
    
    smuggle dateutil                             # raises InstallerError (no "dateutil" package exists)
    # pip: python-dateutil                       # <-- has no effect
    
    smuggle dateutil    # pip: python-dateutil   # installs "python-dateutil" package, if necessary
  • An onion comment may be followed by unrelated inline comments as long as they are separated by at least one space:
    smuggle tqdm    # pip: tqdm>=4.46,<4.60 # this comment is ignored
    smuggle tqdm    # pip: tqdm>=4.46,<4.60            # so is this one
    smuggle tqdm    # pip: tqdm>=4.46,<4.60# but this comment raises OnionArgumentError
  • An onion comment must be the first inline comment immediately following a smuggle statement; otherwise, it is not parsed:
    smuggle numpy    # pip: numpy!=1.19.1        # <-- guarantees smuggled version is *not* v1.19.1
    smuggle numpy    # has no effect -->         # pip: numpy==1.19.1
    This also allows you to easily "comment out" onion comments:
    smuggle numpy    ## pip: numpy!=1.19.1       # <-- has no effect
  • Onion comments are generally whitespace-insensitive, but installer arguments must be separated by at least one space:
    from umap smuggle UMAP    # pip: umap-learn --user -v --no-clean     # valid
    from umap smuggle UMAP#pip:umap-learn --user     -v    --no-clean    # also valid
    from umap smuggle UMAP    # pip: umap-learn --user-v--no-clean       # raises OnionArgumentError
  • Onion comments have no effect on standard library modules:
    smuggle threading    # pip: threading==9999  # <-- has no effect
  • When smuggling multiple packages with a single smuggle statement, an onion comment may be used to refer to the first package listed:
    smuggle nilearn, nibabel, nltools    # pip: nilearn==0.7.1
  • If multiple separate smuggle statements are placed on a single line, an onion comment may be used to refer to the last statement:
    smuggle gensim; smuggle spacy; smuggle nltk    # pip: nltk~=3.5 --pre
  • For multiline smuggle statements, an onion comment may be placed on the first line:
    from scipy.interpolate smuggle (    # pip: scipy==1.6.3
        interp1d,
        interpn as interp_ndgrid,
        LinearNDInterpolator,
        NearestNDInterpolator,
    )
    ... or on the last line:
    from scipy.interpolate smuggle (interp1d,                  # this comment has no effect
                                    interpn as interp_ndgrid,
                                    LinearNDInterpolator,
                                    NearestNDInterpolator)     # pip: scipy==1.6.3
    ... though the first line takes priority:
    from scipy.interpolate smuggle (    # pip: scipy==1.6.3    # <-- this version is installed
        interp1d,
        interpn as interp_ndgrid,
        LinearNDInterpolator,
        NearestNDInterpolator,
    )    # pip: scipy==1.6.2                                   # <-- this comment is ignored
    ... and all comments not on the first or last line are ignored:
    from scipy.interpolate smuggle (
        interp1d,                       # pip: scipy==1.6.3    # <-- ignored
        interpn as interp_ndgrid,
        LinearNDInterpolator,           # unrelated comment    # <-- ignored
        NearestNDInterpolator
    )                                   # pip: scipy==1.6.2    # <-- parsed
  • The onion comment is intended to describe how a specific smuggled package should be installed if it is not found locally, in order to make it available for immediate use. Therefore, installer options that either (A) install packages other than the smuggled package and its dependencies (e.g., from a specification file), or (B) cause the smuggled package not to be installed, are disallowed. The options listed below will raise an OnionArgumentError:
    • -h, --help
    • -r, --requirement
    • -V, --version

The davos Config

The davos config object stores options and data that affect how davos behaves. After importing davos, the config instance (a singleton) for the current session is available as davos.config, and its various fields are accessible as attributes. The config object exposes a mixture of writable and read-only fields. Most davos.config attributes can be assigned values to control aspects of davos behavior, while others are available for inspection but are set and used internally. Additionally, certain config fields may be writable in some situations but not others (e.g. only if the importing environment supports a particular feature). Once set, davos config options last for the lifetime of the interpreter (unless updated); however, they do not persist across interpreter sessions. A full list of davos config fields is available below:

Reference

Field Description Type Default Writable?
active Whether or not the davos parser should be run on subsequent input (cells, in Jupyter/Colab notebooks). Setting to True activates the davos parser, enables the smuggle keyword, and injects the smuggle() function into the user namespace. Setting to False deactivates the davos parser, disables the smuggle keyword, and removes "smuggle" from the user namespace (if it holds a reference to the smuggle() function). See How it Works for more info. bool True
auto_rerun If True, when smuggling a previously-imported package that cannot be reloaded (see Smuggling packages with C-extensions), davos will automatically restart the interpreter and rerun all code up to (and including) the current smuggle statement. Otherwise, issues a warning and prompts the user with buttons to either restart/rerun or continue running. bool False ✅ (Jupyter notebooks only)
confirm_install Whether or not davos should require user confirmation ([y/n] input) before installing a smuggled package bool False
environment A label describing the environment into which davos was running. Checked internally to determine which interchangeable implementation functions are used, whether certain config fields are writable, and various other behaviors Literal['Python', 'IPython<7.0', 'IPython>=7.0', 'Colaboratory'] N/A
ipython_shell The global IPython interactive shell instance IPython.core
.interactiveshell
.InteractiveShell
N/A
noninteractive Set to True to run davos in non-interactive mode (all user input and confirmation will be disabled). NB:
1. Setting to True disables confirm_install if previously enabled
2. If auto_rerun is False in non-interactive mode, davos will throw an error if a smuggled package cannot be reloaded
bool False ✅ (Jupyter notebooks only)
pip_executable The path to the pip executable used to install smuggled packages. Must be a path (str or pathlib.Path) to a real file. Default is programmatically determined from Python environment; falls back to sys.executable -m pip if executable can't be found str pip exe path or sys.executable -m pip
smuggled A cache of packages smuggled during the current interpreter session. Formatted as a dict whose keys are package names and values are the (.split() and ';'.join()ed) onion comments. Implemented this way so that any non-whitespace change to installer arguments re-installation dict[str, str] {}
suppress_stdout If True, suppress all unnecessary output issued by both davos and the installer program. Useful when smuggling packages that need to install many dependencies and therefore generate extensive output. If the installer program throws an error while output is suppressed, both stdout & stderr will be shown with the traceback bool False

Top-level Functions

davos also provides a few convenience for reading/setting config values:

  • davos.activate() Activate the davos parser, enable the smuggle keyword, and inject the smuggle() function into the namespace. Equivalent to setting davos.config.active = True. See How it Works for more info.

  • davos.deactivate() Deactivate the davos parser, disable the smuggle keyword, and remove the name smuggle from the namespace if (and only if) it refers to the smuggle() function. If smuggle has been overwritten with a different value, the variable will not be deleted. Equivalent to setting davos.config.active = False. See How it Works for more

  • info.

  • davos.is_active() Return the current value of davos.config.active.

  • davos.configure(**kwargs) Set multiple davos.config fields at once by passing values as keyword arguments, e.g.:

    import davos
    davos.configure(active=False, noninteractive=True, pip_executable='/usr/bin/pip3')

    is equivalent to:

    import davos
    davos.active = False
    davos.noninteractive = True
    davos.pip_executable = '/usr/bin/pip3'

How It Works: The davos Parser

Functionally, importing davos appears to enable a new Python keyword, "smuggle". However, davos doesn't actually modify the rules or reserved keywords used by Python's parser and lexical analyzer in order to do so—in fact, modifying the Python grammar is not possible at runtime and would require rebuilding the interpreter. Instead, in IPython enivonments like Jupyter and Colaboratory notebooks, davos implements the smuggle keyword via a combination of namespace injections and its own (far simpler) custom parser.

The smuggle keyword can be enabled and disabled at will by "activating" and "deactivating" davos (see the davos Config Reference and Top-level Functions, above). When davos is imported, it is automatically activated by default. Activating davos triggers two things:

  1. The smuggle() function is injected into the IPython user namespace
  2. The davos parser is registered as a custom input transformer

IPython preprocesses all executed code as plain text before it is sent to the Python parser in order to handle special constructs like %magic and !shell commands. davos hooks into this process to transform smuggle statements into syntactically valid Python code. The davos parser uses this regular expression to match each line of code containing a smuggle statement (and, optionally, an onion comment), extracts information from its text, and replaces it with an analogous call to the smuggle() function. Thus, even though the code visible to the user may contain smuggle statements, e.g.:

smuggle numpy as np    # pip: numpy>1.16,<=1.24 -vv

the code that is actually executed by the Python interpreter will not:

smuggle(name="numpy", as_="np", installer="pip", args_str="""numpy>1.16,<=1.24 -vv""", installer_kwargs={'editable': False, 'spec': 'numpy>1.16,<=1.24', 'verbosity': 2})

The davos parser can be deactivated at any time, and doing so triggers the opposite actions of activating it:

  1. The name "smuggle" is deleted from the IPython user namespace, unless it has been overwritten and no longer refers to the smuggle() function
  2. The davos parser input transformer is deregistered.

Note: in Jupyter and Colaboratory notebooks, IPython parses and transforms all text in a cell before sending it to the kernel for execution. This means that importing or activating davos will not make the smuggle statement available until the next cell, because all lines in the current cell were transformed before the davos parser was registered. However, deactivating davos disables the smuggle statement immediately—although the davos parser will have already replaced all smuggle statements with smuggle() function calls, removing the function from the namespace causes them to throw NameError.

Additional Notes

  • Reimplementing installer programs' CLI parsers

    The davos parser extracts info from onion comments by passing them to a (slightly modified) reimplementation of their specified installer program's CLI parser. This is somewhat redundant, since the arguments will eventually be re-parsed by the actual installer program if the package needs to be installed. However, it affords a number of advantages, such as:

    • detecting errors early during the parser phase, before spending any time running code above the line containing the smuggle statement
    • preventing shell injections in onion comments—e.g., #pip: --upgrade numpy && rm -rf / fails due to the OnionParser, but would otherwise execute successfully.
    • allowing certain installer arguments to temporarily influence davos behavior while smuggling the current package (see Installer options that affect davos behavior below for specific info)
  • Installer options that affect davos behavior

    Passing certain options to the installer program via an onion comment will also affect the corresponding smuggle statement in a predictable way:

  • Smuggling packages with C-extensions

    Some Python packages that rely heavily on custom data types implemented via C-extensions (e.g., numpy, pandas) dynamically generate modules defining various C functions and data structures, and link them to the Python interpreter when they are first imported. Depending on how these objects are initialized, they may not be subject to normal garbage collection, and persist despite their reference count dropping to zero. This can lead to unexpected errors when reloading the Python module that creates them, particularly if their dynamically generated source code has been changed (e.g., because the reloaded package is a newer version).

    This can occasionally affect davos's ability to smuggle a new version of a package (or dependency) that was previously imported. To handle this, davos first checks each package it installs against sys.modules. If a different version has already been loaded by the interpreter, davos will attempt to replace it with the requested version. If this fails, davos will restore the old package version in memory, while replacing it with the new package version on disk. This allows subsequent code that uses the non-reloadable module to still execute in most cases, while dependency checks for other packages run against the updated version. Then, depending on the value of davos.config.auto_rerun, davos will either either automatically restart the interpreter to load the updated package, prompt you to do so, or raise an exception.

  • from ... import ... statements and reloading modules

    The Python docs for importlib.reload() include the following caveat:

    If a module imports objects from another module using fromimport …, calling reload() for the other module does not redefine the objects imported from it — one way around this is to re-execute the from statement, another is to use import and qualified names (module.name) instead.

    The same applies to smuggling packages or modules from which objects have already been loaded. If object name from module module was loaded using either from module import name or from module smuggle name, subsequently running smuggle module # pip --upgrade will in fact install and load an upgraded version of module, but the the name object will still be that of the old version! To fix this, you can simply run from module smuggle name either instead in lieu of or after smuggle module.

  • Smuggling packages from version control systems

    The first time during an interpreter session that a given package is installed from a VCS URL, it is assumed not to be present locally, and is therefore freshly installed. pip clones non-editable VCS repositories into a temporary directory, runs setup.py install, and then immediately deletes them. Since no information is retained about the state of the repository at installation, it is impossible to determine whether an existing package satisfies the state (i.e., branch, tag, commit hash, etc.) requested for smuggled package.