python-docker

Python Docker


Keywords
conda, docker
License
BSD-3-Clause
Install
pip install python-docker==0.2.0

Documentation

python-docker

A pure python implementation to build docker images without docker and provide a python api for interacting with docker registries.

Examples using Library

Downloading docker images without docker!

from python_docker.registry import Registry

registry = Registry()
image = registry.pull_image('frolvlad/alpine-glibc', 'latest')

Modify docker image from filesystem

from python_docker.base import Image
from python_docker.registry import Registry

registry = Registry()
image = registry.pull_image('continuumio/miniconda3', 'latest')
image.remove_layer()
image.name = 'this-is-a-test'
image.add_layer_path('./')
image.add_layer_contents({
    '/this/is/a/test1': b'this is test 1',
    '/this/is/a/test2': b'this is test 2'
})
image.layers[0].config['Env'].append('FOO=BAR')

# write docker image to filesystem
image.write_filename('example-docker-image.tar')

# run docker image (does require docker)
image.run(['cat /this/is/a/test1'])

The above example shows how you can update a docker image after pulling it from a registry. Additionally there is a lazy option in the pull_image method. This allows you to modify docker images without actually having to download all the layers. This is an important feature when needing to add a small layer to a larger gpu image that is several GBs.

from python_docker.base import Image
from python_docker.registry import Registry

registry = Registry()
image = registry.pull_image('continuumio/miniconda3', 'latest', lazy=True)

# do the same actions as the example above
# difference is that the layers are lazily downloaded 
# only when needed in the `image.write_filename`
# and `image.run`.

registry.push_image(image)
# push_image does not require downloading the layers

Development

Dependencies

Install the development environment

conda env create -f environment-dev.yaml

Testing

docker-compose up -d
pytest

How does this work?

Turns out that docker images are just a tar collection of files. There are several versions of the spec. For v1.0 the specification is defined here. Instead of writing down the spec lets look into a single docker image.

docker pull ubuntu:latest
docker save ubuntu:latest -o /tmp/ubuntu.tar

List the directory structure of the docker image. Notice how it is a collection of layer.tar which is a tar archive of filesystems. And several json files. VERSION file is always 1.0 currently.

tar -tvf /tmp/ubuntu.tar

Dockerhub happens to export docker images in a v1 - v1.2 compatible format. Lets only look at the files important for v1. Repositories tells the layer to use as the layer head of the current name/tag.

tar -xf /tmp/ubuntu.tar $filename
cat $filename | python -m json.tool

For each layer there are three files: VERSION, layer.tar, and json.

tar -xf /tmp/ubuntu.tar $filename
cat $filename
tar -xf /tmp/ubuntu.tar $filename
cat $filename | python -m json.tool

Looking at layer metadata.

{
    "id": "93935bf1450219e4351893e546b97b4584083b01d19daeba56cab906fc75fc1c",
    "created": "1969-12-31T19:00:00-05:00",
    "container_config": {
        "Hostname": "",
        "Domainname": "",
        "User": "",
        "AttachStdin": false,
        "AttachStdout": false,
        "AttachStderr": false,
        "Tty": false,
        "OpenStdin": false,
        "StdinOnce": false,
        "Env": null,
        "Cmd": null,
        "Image": "",
        "Volumes": null,
        "WorkingDir": "",
        "Entrypoint": null,
        "OnBuild": null,
        "Labels": null
    },
    "os": "linux"
}

Looking at the layer filesystem.

tar -xf /tmp/ubuntu.tar $filename
tar -tvf $filename | head

References