github.com/plumber-cd/terraform-backend-git

Terraform HTTP Backend implementation that uses Git repository as storage


Keywords
terraform, terraform-backends
License
Apache-2.0
Install
go get github.com/plumber-cd/terraform-backend-git

Documentation

Terraform State management using Git

Git as Terraform backend? Seriously? I know, might sound like a stupid idea at first, but let me try to convince you why it's not always the case.

Table of Contents

Getting Started

Installation

Brew

Installation with Brew is coming later.

From Release

Download a binary from Releases. All binaries built with GitHub Actions and you can inspect how.

Don't forget to add it to your PATH.

As Docker Image

See https://github.com/plumber-cd/terraform-backend-git/pkgs/container/terraform-backend-git.

docker pull ghcr.io/plumber-cd/terraform-backend-git:latest

As Github Action

See https://github.com/marketplace/actions/setup-terraform-backend-git.

steps:
  - name: Setup terraform-backend-git
    uses: plumber-cd/setup-terraform-backend-git@v1
    with:
      version:
        0.1.2
  - name: Use terraform-backend-git
    run: terraform-backend-git version

From Sources

You can build it yourself, of course (and Go made it really easy):

go install github.com/plumber-cd/terraform-backend-git@${version}

Don't forget to add it to your PATH.

Usage

The most easy to understand option is the wrapper mode.

Wrapper Mode

Assuming you've installed Terraform as well as this backend (and added it to your PATH), you can do this:

terraform-backend-git git \
  --repository https://github.com/my-org/tf-state \
  --ref master \
  --state my/state.json \
    terraform [any tf args] init|plan|apply [more tf args]

terraform-backend-git will act as a wrapper. It will start HTTP backend, generate Terraform configuration for it and save it to a *.auto.tf file. And then - it will just execute as-is everything you gave it to the right from terraform subcommand. After terraform exits - it will cleanup any *.auto.tf it created and shut down HTTP listener. You shouldn't be having any other backend configurations in your TF code, otherwise Terraform will fail with a conflict.

This mode is explained in more depth in the wrapper CLI section.

Hashicorp Configuration Language (HCL) Mode

You could also create a terraform-backend-git.hcl config file and put it next to your *.tf code:

git.repository = "https://github.com/my-org/tf-state"
git.ref = "main"
git.state = "my/state.json"

You can also specify custom path to the hcl config file using --config arg.

You can also have a mixed setup, where some parts of configuration comes from terraform-backend-git.hcl and some - from CLI arguments or even environment variables (see details below).

Standalone Terraform HTTP Backend Mode

Basically, you can run this backend as a standalone server (locally or remotely) as a daemon. You can either run it permanently, or have it started in your pipeline right before it is about to perform some Terraform actions.

terraform-backend-git &

Then, you just configure your Terraform code to use an HTTP backend.

Your Terraform backend configuration should be looking something like this:

terraform {
  backend "http" {
    address = "http://localhost:6061/?type=git&repository=https://github.com/my-org/tf-state&ref=master&state=my/state.json"
    lock_address = "http://localhost:6061/?type=git&repository=https://github.com/my-org/tf-state&ref=master&state=my/state.json"
    unlock_address = "http://localhost:6061/?type=git&repository=https://github.com/my-org/tf-state&ref=master&state=my/state.json"
  }
}

Note that lock_address and unlock_address should both be explicitly defined. If they are not defined - Terraform assumes that the backend implementation does not support locking, so it will never attempt to lock the state, which might be dangerous and might lead to state file corruptions.

Now, just run Terraform and it will use the backend:

terraform init|plan|apply

When you're done, and if you want to stop the backend - it uses pid files to make it easier to stop:

terraform-backend-git stop

Wrappers CLI

Command line syntax goes like this:

terraform-backend-git [backend options] <storage type> [storage options] <program> [any sub-process arguments]

For instance:

terraform-backend-git --access-logs git --state my/state.json terraform -detailed-exitcode -out=plan.out
#                                    |                            |
#                                    |                            \--- This is the program to run when HTTP backend is ready.
#                                    |                                 Everything to the right are as-is arguments to that program.
#                                    |
#                                    \-------------------- This is the name of the storage type to use.
#                                                          To the right are the arguments to control that storage settings.
#                                                          To the left are the arguments to control global backend settings.

Initially it is meant to only support git as a storage, hence the name of it included git. But later on it was realized that a pluggable architecture would allow to create alternative storage implementations re-using same protocol, encryption and so on. So tat's why it feels like a duplication of git, maybe in the future we will just rename the project to a terraform-http-backend.

terraform is also there because in the future we may extend support to other tools such as (but not limited to) terragrunt and terratest.

Configuration

CLI terraform-backend-git.hcl Environment Variable TF HTTP backend config Description
--repository git.repository TF_BACKEND_GIT_GIT_REPOSITORY repository Required; Which repository to use for storing TF state?
--ref git.ref TF_BACKEND_GIT_GIT_REF ref Optional; Which branch to use in that repository? Default: master.
--state git.state TF_BACKEND_GIT_GIT_STATE state Required; Path to the state file in that repository.
--config - - - Optional; Path to the hcl config file.
--address address TF_BACKEND_GIT_ADDRESS - Optional; Local binding address and port to listen for HTTP requests. Only change the port, do not change the address to 0.0.0.0 before you read Running backend remotely. Default: 127.0.0.1:6061.
--access-logs accessLogs TF_BACKEND_GIT_ACCESSLOGS - Optional; Set to true to enable HTTP access logs on backend. Default: false.

Git Credentials

Both HTTP and SSH protocols are supported. As of now, any sensitive configuration is only supported via environment variables.

Variable Description
GIT_USERNAME Specify username for Git, only required for HTTP protocol.
GIT_PASSWORD/GITHUB_TOKEN Git password or token for HTTP protocol. In case of token you still have to specify GIT_USERNAME.
SSH_AUTH_SOCK ssh-agent socket.
SSH_PRIVATE_KEY Path to SSH key for Git access.
StrictHostKeyChecking Optional; If set to no, will not require strict host key checking. Somewhat more secure way of using Git in automation is to use ssh -T -oStrictHostKeyChecking=accept-new git@github.com before starting any automation.

Backend will determine which protocol you are using based on the repository URL.

For SSH, it will see if ssh-agent is running by looking into SSH_AUTH_SOCK variable, and if not - it will need a private key. It will try to use ~/.ssh/id_rsa unless you explicitly specify a different path via SSH_PRIVATE_KEY.

Unfortunately go-git will not mimic real Git client and will not automatically pickup credentials from the environment, so this custom credentials resolver chain has been implemented since I'm lazy to research the "right" original Git client approach. It is recommended to use Git Credentials Helpers (aka ASKPASS).

State Encryption

To enable encryption set the env var TF_BACKEND_HTTP_ENCRYPTION_PROVIDER to one of the following values:

  • sops
  • aes

We are using sops as encryption abstraction. sops supports many different encryption backends, but unfortunately it does not provide one stop API for all of them, so on our side we should define configuration and create binding for each. At the moment, we have following bindings for sops backends:

  • PGP
  • AWS KMS
  • GCP KMS
  • Hashicorp Vault

Before we integrated with sops - we had a basic AES256 encryption via static passphrase. It is no longer recommended, although might be useful in some limited scenarios. Basic AES256 encryption is using one shared key, and it encrypts entire JSON state file that it can no longer be read as JSON. sops supports various encryption-as-service providers such as AWS KMS and Hashicorp Vault Transit - meaning encryption can be safely performed without revealing private key to the encryption clients. That means keys can be easily rotated, access can be easily revoked and generally it dramatically reduces chances of the key leaks.

sops

sops supports Shamir's Secret Sharing. You can configure multiple backends at once - each will be used to encrypt a part of the key. You can set TF_BACKEND_HTTP_SOPS_SHAMIR_THRESHOLD if you want to use a specific threshold - by default, all keys used for encryption will be required for decryption.

PGP

Use TF_BACKEND_HTTP_SOPS_PGP_FP to provide a comma separated PGP key fingerprints. Keys must be added to a local gpg in order to encrypt. Private part of the key must be present in order for decrypt.

AWS KMS

Use TF_BACKEND_HTTP_SOPS_AWS_KMS_ARNS to provide a comma separated list of KMS ARNs. AWS SDK will use standard credentials provider chain in order to automatically discover local credentials in standard AWS_* environment variables or ~/.aws. You can optionally use TF_BACKEND_HTTP_SOPS_AWS_PROFILE to point it to a specific shared profile. You can also provide additional KMS encryption context using TF_BACKEND_HTTP_SOPS_AWS_KMS_CONTEXT - it is a comma separated list of key=value pairs.

GCP KMS

Use TF_BACKEND_HTTP_SOPS_GCP_KMS_KEYS to provide a comma separated list of GCP KMS IDs. Read Encrypting using GCP KMS for further details.

Hashicorp Vault

Use TF_BACKEND_HTTP_SOPS_HC_VAULT_URIS to point it to the Vault Transit keys. It is a comma separated list of URLs in a form of ${VAULT_ADDR}/v1/transit/keys/key, where transit is a name of Vault Transit mount and key is the name of the key in that mount. Under the hood Vault SDK is using standard credentials resolver to automatically discover Vault credentials in the environment, meaning you can either use vault login or set VAULT_TOKEN environment variable.

AES256

To enable state encryption, you can use TF_BACKEND_HTTP_ENCRYPTION_PASSPHRASE environment variable to set a passphrase. Backend will encrypt and decrypt (using AES256, server-side) all state files transparently before storing them in Git. If it fails to decrypt the file obtained from Git, it will assume encryption was not previously enabled and return it as-is. Note this doesn't encrypt the traffic at REST, as Terraform doesn't support any sort of encryption for HTTP backend. Traffic between Terraform and this backend stays unencrypted at all times.

Running backend remotely

This can be done, as previously mentioned, but it is not recommended. Although latest versions of this backend do support TLS in-transit encryption as well as at-rest encryption via sops - it still doesn't support authentication beyond very basic HTTP auth with a single shared password. Exposed backend will not give much flexibility in terms of the user access control, so it isn't really secure.

It is hard to tell at the moment where feature requests from users and my own use cases will take this project next, bur originally it was designed to be a local-only thing. Once backends in Terraform can be pluggable gRPC components, this backend was planned to be converted to a normal gRPC plugin and HTTP support was planned to be removed. Basically, the idea was to use HTTP until gRCP for backend implementations were not available.

You may probably get creative and use something like Istio or maybe Keycloack to add external layer of encryption, authentication and authorization.

If you are absolutely sure you want to run this backend in remote standalone mode - you need to run it with --address=:6061 argument so the backend will bind to 0.0.0.0 and become remotely accessible, otherwise - it will only listen on 127.0.0.1.

TLS

You can set TF_BACKEND_GIT_HTTPS_CERT and TF_BACKEND_GIT_HTTPS_KEY pointing to your cert and a key files. This will make HTTP backend to start in TLS mode. If you are using self-signed certificate - you can also set TF_BACKEND_GIT_HTTPS_SKIP_VERIFICATION=true in a wrapper mode and that will enable skip_cert_verification in the terraform config (or configure it yourself for standalone mode).

Basic HTTP Authentication

You can use TF_BACKEND_GIT_HTTP_USERNAME and TF_BACKEND_GIT_HTTP_PASSWORD environment variables to add an extra layer of protection. In wrapper mode, same environment variables will be used to render *.auto.tf config for Terraform, but if you are using backend in standalone mode - you will have to tell these credentials to the Terraform explicitly:

terraform {
  backend "http" {
    ...
    username = "user"
    password = "pswd"
  }
}

Note that if either username or password changes - Terraform will consider this as a backend configuration change and will want to ask you to migrate the state. Since backend will not be accepting old credentials anymore - it will fail to init (can't read the "old" state). Consider running init -reconfigure or deleting your local .terraform/terraform.tfstate file to fix this issue.

Why not native Terraform Backend

Unfortunately, Terraform Backends is not pluggable like Providers are, see hashicorp/terraform#5877.

Due to this, I couldn't make a proper native Terraform backend implementation for Git, it should have been implemented and added to https://github.com/hashicorp/terraform code base. There is an open ticket to do it hashicorp/terraform#24603, but it is unclear when this would happen (if it will at all). That said I figured this HTTP backend implementation might be useful for the time being.

Why storing state in Git

So you must be wondering why is that I think storing Terraform state in Git might be such a wonderful idea.

There is one particular chicken-egg problem that I ran into again, and again, and again. As I tend to manage ALL my infrastructure with code (and usually it's Terraform) - among the supported backend types none would exist before I create it. With code. Starting to feel the problem?

Backend types that use managed object storages (like s3) having the least amount of dependencies (i.e. they require no VPC), so before creating this backend - that's what I was usually using. But even then the chicken-egg issue is still there - you'd need a bucket itself, probably some replication config, encryption, IAM... And then there's also DynamoDB for locking. Usually I'd express that in TF code and just apply it locally for the first time (bootstrap). And then I will manually push that state to newly created bucket. What if I want to automate AWS account creation with Terraform too? To make it fully automated, which is totally doable, it would require some amount of custom glue... And that glue cannot be packaged as a Terraform module.

And then what if I want to go multi-cloud? Well, then I either store my GCP and Azure state in AWS, or I use 3 different state storages. Which would complicate my pipelines and make things less portable overall.

To throw even more shit on the fan - I also use Terraform to manage my Git repositories (with GitHub provider). It's an infrastructure too, after all. With proper structure and layers of abstractions - my Terraform code alone may easily go over 10 repositories for even smallest projects, and managing repositories should not be a burden. I want every single repository to be unified and configured in the same way, i.e. access, protected branches, merging policies etc.

And then - think about other people who doesn't even have infrastructure (or access). They might want to use Terraform for something completely irrelevant to the infrastructure, as there are hundreds of providers out there. What if they need to store TF state and just not ready to get into infra/pipelines management business?

Often when I start a new project, I myself - don't have any infrastructure for it yet. I don't even have an AWS account yet. I just want to create a few initial repositories to start working on it. And then my choice as to the state management is usually limited to a local state, and then I'd have to commit that state manually to git. It's fine when I'm alone, but as soon as multiple people involved - it gets complicated (things like manually "locking" the state via chat, fancy PR merging rules etc). And remember - we don't even have any infra yet, so forget about CD and pipelines for now.

Of course - there's Terraform Cloud, which is basically exists to address that exact problem (among many other). It provides state management as a service. A great product which I absolutely love, but honestly for a small projects, that doesn't need (yet?) any of that complex logic and fancy pipelines - sounds like an expensive overkill. I just remote state management with locking, that's all. Besides, what if that project is a PoC that is not even guaranteed to stay alive for a long time? What if the nature of the project is actually a Terraform proof of concept with a simple goal to sell developers on using it? If no one knows for sure yet if they even need Terraform - no one will buy commercial version of it for sure. I had to wear a hat of a Terraform proponent and a pioneer multiple times during my career, and all of this usually was a huge barrier and an obstacle for me to even establish initial conversations about Terraform. Terraform state migrations are a piece of cake so we can take care of that much later, when we actually need it.

One day I realized something really simple. If I'm pushing my Terraform state to git anyway (initially during bootstrap) - why not just fully embrace that concept and just do it right? Why not split the state from the code, create a separated isolated Git repository for it, and use it transparently to the Terraform user? Why not, basically, make Git a backend storage for a real Terraform backend?

Even if I don't have any infra yet - I surely do have some git server. I do have some repositories somewhere to share the code, right? It might be some public cloud service like GitHub/GitLab/Bitbucket/etc, or maybe it's a service within my Org that already existed on-prem.

Proposed solution

Below is a proposal as to how a native Git backend implementation would look like in Terraform. HTTP backend implementation in this repository, basically, implements this proposal.

Consider a separate Git repository designated just for the Terraform state files. It is used as a backend, i.e. the fact it's a git repository is hidden from the user and considered an implementation detail. That means user scenarios doesn't really involve interacting with Git repository using Git clients.

Git server access configuration would define who have access to manage the state, i.e. users will still need their Git credentials. State files can also be encrypted in Git at rest.

The backend configuration might be looking something like this:

terraform {
  backend "git" {
    repository = "https://github.com/my-org/tf-state?ref=main"
    file = "path/to/state.json"
  }
}

State locking would be based on branches, as creating a new branch is atomic operation.

To acquire a lock - it would mean to push a branch named locks/${file}. The branch would need to have a file ${file}.lock added and committed to it with a standard Terraform locking metadata in it. If pushing the branch fails with error saying that fast forward push is not possible - that would mean something else already acquired the lock. To check if the state currently locked - would mean to check if the branch currently exists remotely. To read the information about the current lock - would mean to pull that branch and read the ${file}.lock. To unlock - would mean to simply delete that remote branch.

This implementation proposal for the state locking might sound little weird, but keep in mind that the aim was to avoid complex Git scenarios that would involve merging and conflict solving. This proposal is trying to keep local Git working tree fast-forwardable at all times. As Git repository for state files is not really meant to be used by people directly at all, so it should be fine if we diverge a little from Git common best practices here.

To visualize and make it easier to understand, below is how the TF scenarios would translate into the command line:

Lock

# Checkout current ref requested by user and cleanup any leftovers
git reset --hard
git checkout ${ref}
git branch -D locks/${file}
# Pull latest remote state
git pull origin ${ref}
# Start a new locking branch
git checkout -b locks/${file}
# Save lock metadata
echo ${lock} > ${file}.lock
git add ${file}.lock
git commit -m "Lock ${file}"
git push origin locks/${file}
# If push failed saying that fast forward is not possible - something else had it already locked

Check existing Lock

# Checkout current ref requested by user and cleanup any leftovers
git reset --hard
git checkout ${ref}
git branch -D locks/${file}
# Fetch locks
git fetch origin refs/heads/locks/*:refs/remotes/origin/locks/*
# Checkout the lock branch, if it fails - it wasn't locked
git checkout locks/${file}
# Check if it was locked by me
cat ${file}.lock

Unlock

# First - use routine from above to check that it is currently locked and the lock author is me.
# Then - it's a matter of deleting the lock branch remotely
git push origin --delete locks/${file}

Get state

# Checkout current ref requested by user and cleanup any leftovers
git reset --hard
git checkout ${ref}
# Pull latest
git pull origin ${ref}
# Read state
cat ${file}

Update state

# First - use routine from above to check that it is currently locked and the lock author is me.
# Then - checkout current ref requested by user and cleanup any leftovers
git reset --hard
git checkout ${ref}
# Pull latest
git pull origin ${ref}
# Save state
echo ${state} > ${file}
git add ${file}
git commit -m "Update ${file}"
git push origin ${ref}

Delete state

# First - use routine from above to check that it is currently locked and the lock author is me.
# Then - checkout current ref requested by user and cleanup any leftovers
git reset --hard
git checkout ${ref}
# Pull latest
git pull origin ${ref}
# Delete state
git rm -f ${file}
git commit -m "Delete ${file}"
git push origin ${ref}