nrk.no

The road to NRK’s private Terraform registry

Kategorier: Dev,English Articles,Open Source & Software

The NRK logo and the Terraform logo side by side.

NRK is a large technology organisation with its own platform team that runs all the shared IT infrastructure and helps other teams being able to operate their own. To make all the infrastructure as code (IaC) repositories across the organisation simpler and up to standards, we develop and share reusable Terraform modules. This is a story of how we started out with Terraform and how we got to where we are today, from the perspective of the platform team itself.

Note: Please excuse the [at] in place of @ througout this post, which is a way to circumvent the email address sanitizer.

The beginning

We started out with a Terraform mono-repo to configure and provision our shared infrastructure. This repository, naturally, became bigger and bigger. We started to see the need to branch out into multiple configuration repositories. From there, the need to share modules between repositories arised.

Configuration repositories is where we actually run terraform apply, while module repositories are stand-alone repositories holding a single Terraform module we can depend on.

module "mymodule" {
  source = "git[at]github.com/nrkno/mymodule"
}

The ground is moving under our feet

There are some problems with referencing module repositories in Terraform with Git addresses:

  • We don’t know what version we have downloaded locally, meaning that multiple clients can have different module versions even after running terraform init.
  • We don’t know if the module has been updated remotely/upstream because there is no locking mechanism. This requires running terraform get -update separately.
  • Suffixing the address with ?ref=<git-commit-ish> works, but is a hassle to update manually, and versions within different configuration repositories will soon diverge.
    module "mymodule" {
      source = "git[at]github.com/nrkno/mymodule?ref=1.2.3"
    }

The reality for quite some time was that we tried to enforce admins of the configuration repositories to always run git pull, terraform get -update and then terraform init. What felt like more often than not, a module repository had been updated under our feet and we then had to handle a huge diff of changes a single person had no insight into. Without a proper changelog and lack of an established Git workflow internally it was very hard to determine what had changed, even for teams that updated and ran terraform apply in their configuration repositories on a regular basis.

Then came Dependabot

As of Terraform v0.14 (Dec 2020) terraform init produces a lock file for provider versions. Dependabot started noticing updates to providers we were using, but it could also be configured to check and update module versions, both in the official public registry and in private Terraform registries. All our modules are residing in private GitHub repositories and we have currently no interest in publishing them to the public Terraform registry, due to the tight coupling to our organisation.

We looked at some different implementations of a private Terraform module registry, but we didn’t quite find anything that was simple enough for our use-case. A potential for using GitHub as a backend was also looming. Our vision was that if we enforced conventional commits in our module repositories and versioned them automatically with semantic versioning, Dependabot should, in theory, be able to open pull requests for us automatically. The maintainers of the configuration repositories would then

  • know when a module they are using has been updated.
  • be able to choose when and if they want or need to update.
  • be provided a changelog to help them decide.

Shared workflows and required pull request checks

No one wants to enforce standards manually. It doesn’t scale and it isn’t fun. We created shared workflows for GitHub Actions that our module repositories would always run. This also meant adding restrictions to prevent pushing to the default branch, and checks and review requirements to reduce the chance of nonsense commit messages getting merged and ending up in the changelog. We can also update our workflow standards centrally in the shared workflow repository when the need for new checks arise instead of updating each and every module repository manually.

Checks that have to pass before being able to merge a pull request today are:

  • Conventional commit check that requires all commit messages to have a set prefix indicating what kind of change it is, and whether or not it is a breaking change.
  • Lint the configuration files (terraform fmt)
  • Verify that the configuration can be initialised (terraform init)
  • Validate the configuration (terraform validate)
  • Trivy IaC scanning to identify known vulnerable configuration settings (e.g. prevent us from inadvertently allowing public access to an Azure Storage Account)

When a pull request is merged, another workflow will run:

  • Semantic release that parses all commit messages since last release and creates a new release with a semantic version number bumped at just the right level. A changelog is added to the release notes of the new version. This will be included in the message of the pull requests Dependabot opens in the repository depending on this module.

This increases our trust in bumping our versions, since we should not (in theory) be afraid to bump our module versions as long as it is not a new major (i.e. v1.2.3 → v2.0.0).

Terraform Registry

Now that we had versioned our Terraform modules we had enough data to serve modules through a registry. Due to the simplicity of the registry protocol we developed our own implementation using GitHub as a backing store directly, meaning no storage in the registry itself. One of the beautiful features when the Terraform client attempts to download a module from the registry is that we can return a Terraform compatible module URL in a response header that redirects the client to another download location. – We don’t have to serve the module source directly.

To illustrate how this works, given a module configuration

module "my_module_instance" {
  source = "terraform-registry.nrk.cloud/nrkno/mymodule/generic"
  version = “1.2.3”

  some_param = "foo"
  team = "my-team"
}

Terraform will query the server at https://terraform-registry.nrk.cloud/.well-known/terraform.json to see at what path the module API resides:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "modules.v1" : "/v1/modules/"
}

Then it will query the module API for all available versions of the specified module at https://terraform-registry.nrk.cloud/v1/modules/nrkno/mymodule/generic/versions:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "modules" : [
    {
      "versions" : [
        {
          "version" : "1.2.3"
        },
        {
          "version" : "1.2.2"
        }
      ]
    }
  ]
}

Terraform now knows that the version exists and can query the registry for the module source from https://terraform-registry.nrk.cloud/v1/modules/nrkno/mymodule/generic/2.1.0/download:

HTTP/1.1 204 No Content
X-Terraform-Get: git::ssh://git[at]github.com/nrkno/mymodule.git?ref=v1.2.3

Since the HTTP response code is 204 and an X-Terraform-Get header exists, Terraform will use the header value as the source URL for the module. It will then be downloaded using the client’s local SSH client. Returning Git over SSH URL’s means we don’t have to implement strict authorization rules in the registry itself. Clients will (still) need to have an authorized SSH private key to download the module source repository from GitHub. However, we still require a valid token to query the registry API, but it doesn’t require role based authorization rules. All tokens have access to query the whole registry.

How Dependabot finds new module versions

When Dependabot comes looking for new module versions, it scans your Terraform code and looks for module definitions to extract the Terraform registry URL (if present) and the specified module version. It then queries the registry for a list of available module version, compares it to the version specified in your repository and opens a new pull request if there is a newer version available. The pull request message will include the changelog between the old and the new module version numbers, taken from the release notes of the GitHub release.

Here’s an example configuration for Dependabot in a Terraform repository. Note that Dependabot uses its own set of secrets in a repository.

# .github/dependabot.yml
version: 2
registries:
  terraform-registry.nrk.cloud:
    type: terraform-registry
    url: https://terraform-registry.nrk.cloud
    token: ${{ secrets.TERRAFORM_REGISTRY_TOKEN }}
updates:
- package-ecosystem: terraform
  directory: "/"
  registries:
    - terraform-registry.nrk.cloud
  schedule:
    interval: daily
    time: "08:00"
    timezone: "Europe/Oslo"
  open-pull-requests-limit: 5
  reviewers:
    - nrkno/plattform

What more do we want for the future?

Our biggest pain points have been fixed, but there is of course still room for improvement.

  • Automated tests for our modules that actually provisions the resources in question. This has raised some concerns around cost, but it should be doable. Actually testing the module instead of just statically validating the configuration might help us to verify whether we introduced a breaking change or not.
  • Automatic apply on merge, or similar workflows, can help a lot in certain repositories. We see that both Terraform Cloud and Atlantis can offer this and we hope to implement either these or similar solutions for at least some of our repositories.
  • A frontend for the registry to provide a one-stop-shop to discover, learn and consume our Terraform modules.

References and links

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *. Les vår personvernserklæring for informasjon om hvilke data vi lagrer om deg som kommenterer.