P2: Tools for Scalable Software Deployment
This is a collection of tools intended to allow huge fleets of machines to participate in safe, flexible and scalable deployment models. It was designed for Square but is a general-purpose framework that should look suspiciously like Kubernetes to anyone paying close attention.
Using Docker isn't an overnight choice, especially for a company with a long
history of deploying things that aren't Docker. P2 supports our internal
artifact specification ("Hoist artifacts") which are
.tar.gz files with a
.tar.gz can be a Hoist artifact, as long as it has a
script or directory of scripts to exec under process management (we use Runit).
Hoist artifacts are totally self-contained and are expected to have all dependencies statically linked internally with very few exceptions.
P2 executes artifacts in resource constrained cgroups as different users with different home directories to create extremely lightweight isolation.
Pods, Labels and Replication Controllers
Kubernetes provides some excellent tools for grouping and managing sets of applications. We copied them! We didn't want to wait to have our entire Docker ecosystem established (new build system, new kernel, etc) to start using these great higher-order orchestration primitives.
We currently have production-quality support for pod manifests, replication controllers and rolling updates, analagous to Kubernetes pods, replication controllers and deployments, respectively. We are also actively working on pod clusters, our variation on Kubernetes services.
We had to solve a number of problems that Square has today. That led us to the following concepts built-in from the beginning:
Arbitrary configuration files written into the pod manifest, exported and
Application lifecycle management and health. During the shutdown of an
instance, we first run
bin/disable. When starting up an instance, we run
bin/enable, and then monitor the application via a call to
GET /_status. A 200 response code means ready and healthy.
Rich plugin architecture for secret company stuff. For example, our
integration with Keywhiz is implemented in an
hookspackage in this repo provides a handy Go library for writing hooks that can be scheduled.
Self-hosting! We wanted to deploy P2 with P2, so we did that. The binary
p2-bootstrapallows you to set up a Consul agent and a P2 preparer on the same host. If done right, that host should allow any future deploys to Just Work, including to both the Consul agent and the preparer themselves!
- Deployment Authorization. From the beginning we needed a way to restrict who can start which applications. The preparer can be given an ACL that can be enforced by GPG signatures on pod manifests, signed by the deployer. Or if you hate GPG, you can use delegated signing with a trusted orchestration service.
To build the tools in
rake build. The
bin subdirectory contains
agents and executables, the
pkg directory contains useful libraries for Go.
We strongly believe in small things that do one thing well.
bin/contains executables that, together, manage deployment. The
bootstrapexecutable can be used to set up new nodes.
pkg/contains standalone libraries that provide supporting functionality of the executables. These libraries are all useful in isolation.
rake integration will attempt to launch a Vagrant Centos7 machine on
your computer, launch Consul and our preparer and then launch an application.
If you see a success message, you can
vagrant up the halted box to check out
the setup without needing to do any work yourself.
P2 is based on existing deployment tools at Square. The following list reflects all the system dependencies required by every P2 library, although many libraries require only one of these or are dependency-free.
Adding Docker support is a big next step, but will ultimately help us migrate to using Docker (or equally excellent RunC implementation) at Square.
P2 also lacks a native job admission / scheduling system, so all pod scheduling is currently done manually by client using either a label selector or simply a hostname. Solutions to this are to be determined.