Multi-part cloud-init provisioning with Terraform

April 28, 2022 

Cloud-Init is "a standard for customizing" cloud instances, typically on their first boot. It is allows mixing state-based configuration management with imperative provisioning commands (details in our IaC article). By using cloud-init most of the annoyances of SSH-based provisioning can be avoided:

  • Having to use (possibly shared) SSH keys for provisioning
  • Having to have direct network access to the VM being provisioned (may require a VPN connection)

That said, neither cloud-init itself, nor its use within Terraform is particularly well documented. Therefore it can be an effort to create cloud-init-based provisioning that works and adapts easily to different use-cases. This article attempts to fill that gap to some extent.

In this particular case I had to convert an existing, imperative SSH-based Puppet agent provisioning process to cloud-init, so there's very little state-based configuration management in all of this. What I ended up with a three-phase approach:

  1. Put the provisioning scripts on the host
  2. Run all the provisioning scripts that are required for any particular use-case
  3. Remove the provisioning scripts from the host

The first step includes creating a cloud-init yaml config, write-scripts.cfg, that has all the provisioning scripts embedded into it:

#cloud-config
write_files:
  - path: /var/cache/set-hostname.sh
    owner: root:root
    permissions: '0755'
    content: |
      #!/bin/sh
      #
      # Script body start
      --- snip ---
      # Script body end
  - path: /var/cache/add-puppetmaster-to-etc-hosts.sh
    owner: root:root
    permissions: '0755'
    content: |
      #!/bin/sh
      #
      # Script body start
      --- snip ---
      # Script body end
  - path: /var/cache/add-deployment-fact.sh
    owner: root:root
    permissions: '0755'
    content: |
      #!/bin/sh
      #
      # Script body start
      --- snip ---
      # Script body end
  - path: /var/cache/install-puppet.sh
    owner: root:root
    permissions: '0755'
    content: |
      #!/bin/sh
      #
      # Script body start
      --- snip ---
      # Script body end

The key with these scripts is that they are not Terraform templates. Instead, they're static files that take parameters to adapt their behavior, including doing nothing if the user so desires. The main reason for making this file static instead of a template is that it prevents Terraform variable interpolation from getting confused about POSIX shell variables written in the ${} syntax.

The cloud-init part is just thin wrapping to allow "uploading" the scripts to the host. In Terraform we load the above file using a local_file datasource:

# cloud-init config that installs the provisioning scripts
data "local_file" "write_scripts" {
  filename = "${path.module}/write-scripts.cfg"
}

This alone does not do anything, just makes the file contents available for use in Terraform.

The next step is to create the cloud-init config, run-scripts.cfg.tftpl, that actually runs the scripts and does cleanup after the scripts have run. As the name implies, it is a Terraform template:

#cloud-config
runcmd:
  - [ "/var/cache/set-hostname.sh", "${hostname}" ]
%{ if install_puppet_agent ~}
  - [ "/var/cache/add-puppetmaster-to-etc-hosts.sh", "${puppetmaster_ip}" ]
  - [ "/var/cache/add-deployment-fact.sh", "${deployment}" ]
  - [ "/var/cache/install-puppet.sh", "-n", "${hostname}", "-e", "${puppet_env}", "-p", "${puppet_version}", "-s"]
%{endif ~}
  - [ "rm", "-f", "/var/cache/set-hostname.sh", "/var/cache/add-puppetmaster-to-etc-hosts.sh", "/var/cache/add-deployment-fact.sh", "/var/cache/install-puppet.sh" ]

Note the ~ after the statements: it ensures that a linefeed is not added to the resulting cloud-init configuration file.

By making this file a template we can drive the provisioning logic using "advanced" constructs like real for-loops and if statements which Terraform (or rather, HCL2) itself lacks. Templating also allows making all provisioning steps conditional - something that's very difficult to accomplish with SSH-based provisioning (see my earlier blog post).

The matching Terraform datasource looks like this:

data "template_file" "run_scripts" {
  template = file("${path.module}/run-scripts.cfg.tftpl")
  vars     = {
               hostname             = var.hostname,
               deployment           = var.deployment,
               install_puppet_agent = var.install_puppet_agent,
               puppet_env           = local.puppet_env,
               puppet_version       = var.puppet_version,
               puppetmaster_ip      = var.puppetmaster_ip,
             }
}

As can be seen the template does not magically know the values that are already available in Terraform code - instead, they need to be passed to the template explicitly as a map.

The next step is to bind the two cloud-init configs into a single, multi-part cloud-init configuration using the cloudinit_config datasource:

data "cloudinit_config" "provision" {
  gzip          = true
  base64_encode = true

  part {
    content_type = "text/cloud-config"
    content      = data.local_file.write_scripts.content
  }

  part {
    content_type = "text/cloud-config"
    content      = data.template_file.run_scripts.rendered
  }
}

The above file shows one the strengths of cloud-init: you can do provisioning using a combination of shell commands, scripts and cloud-init configurations by setting the content_type appropriately for each part. See cloud-init documentation for more details.

Finally we can pass the rendered cloud-init configuration to the VM resource that will consume it:

resource "aws_instance" "ec2_instance" {
  --- snip ---
  user_data = data.cloudinit_config.provision.rendered
}

You may also want to ensure that changes to provisioning scripts do not trigger instance rebuilt:

  lifecycle {
    ignore_changes = [
      user_data,
    ]
  }

When developing cloud-init templates it can be useful to validate their contents:

$ cloud-init devel schema --config-file <config-file>

This will catch all the easy errors quickly. According to some sources this command is (or was) nothing but a glorified yaml linter, but still, it is easily available on Linux so worth using.

If provisioning scripts are not working as expected, cloud-init logs may reveal why:

  • /var/log/cloud-init-output.log: output from the scripts (useful for debugging issues with your scripts)
  • /var/log/cloud-init.log: cloud-init's own logs (useful for debugging issues with cloud-init itself)

Some notes:

  • Cloud-init supports configuration merging, which allows defining the same module (e.g. write_files) several times in different parts of cloud-init multi-part configuration and merging the results instead of letting the last module call determine the end result. However, based on my testing this did not seem to work in Terraform.
  • Cloud-init's configuration syntax has been aptly described by James Nugent as "baroque". I have to agree with that statement.

External links:

Samuli Seppänen
Samuli Seppänen
Author archive
menucross-circle