Options for building self-service infrastructure with Ansible and Terraform

November 22, 2023 
It is possible to build self-service infrastructure with Ansible and Terraform

Introduction

Being able to build IT infrastructure always having to involve an IT, operations or platform team seems be a big priority for many organizations. This is understandable because processes typically slow down to a crawl when they have to pass organizational boundaries. Also, teams responsible for infrastructure would not want to spend time on routine requests if possible. Provisioning infrastructure is no exception here. It is possible to create a custom solution to enable self-service infrastructure with Ansible and Terraform, but that takes a fair amount of effort. In this article we focus on open source and commercial open source offerings that make enabling self-service infrastructure provisioning easier.

At this point I should not that Hashicorp's recent license change to a non-open source license (BSL 1.1) may have an impact in your decision. The result of that license change was that Terraform was forked as OpenTofu which is now under the stewardship of the Linux Foundation. This may affect Hashicorp's commercial product viability in the long run due to the breach of trust towards the community, as well as because of practical technical issues. For example, in the future OpenTofu will probably become the "new tool of choice" for building Cloud infrastructure. This means that Terraform's popularity will start to decline. Moreover, Terraform and OpenTofu will probably start to diverge in functionality at some point, because sharing code between the two projects is not possible.

The contenders

If we stay in the "open source" or "almost open source" (=Terraform) sphere there seem to be three viable contenders for building the plumbing for self-service infrastructure:

  1. Ansible Core with Ansible Rulebook
  2. Event Driven Ansible Controller
  3. Ansible AWX
  4. Ansible Automation Platform
  5. Terraform Cloud

All of these build on top of Infrastructure as Code tools, but have their unique challenges which I'll try to cover on a high level in this blog post.

Ansible Core with Ansible Rulebook

What is Ansible Core?

Ansible Core is a very popular tool used for IT automation tasks. Ansible-playbook that is included in Ansible Core enables you to do configuration management, orchestration and data gathering tasks. However, ansible-playbook, or Ansible Core in general, does not include any self-service features.

What is Ansible Rulebook?

For self-service infrastructure with Ansible we need to use another tool called Ansible Rulebook, which allows you to run automation code in response to a message. This is what Red Hat calls "event-driven Ansible" because the message is sent as a response to an event external system. The external system can send the message directly to Ansible Rulebook, or use some sort of messaging system as the middleman.

Sources for Ansible Rulebook

A message can come to Ansible Rulebook from various sources, such as:

  • Kafka
  • Alertmanager
  • Webhooks
  • Web scrapes
  • File watchdogs
  • Azure service bus
  • AWS SQS queue

Red Hat and the community is developing new plugins so this list is going to grow over time.

The ansible-rulebook executable listens to messages from various sources. When a message arrives, ansible-rulebook checks if it should trigger an Ansible Playbook run as a response. You can even send back a message to the source notifying of what happened, if you want.

Self-service infrastructure with Ansible Rulebook

How is this related to self-service infrastructure, then? Imagine a ticketing system where members of an organization request the IT team to provision some infrastructure for them. Instead of letting the IT team handle the request manually, you can make the ticketing system send a message to ansible-rulebook which will then automatically provision the infrastructure. You could also have an approval process in the ticketing system by sending the message only an approval. For example, only a manager could movethe ticket from "Open" to "Approved" state, after which the ticketing system would send the message to ansible-rulebook.

Ansible Rulebook enables more than just self-service infrastructure

The event-driven Ansible approach is very generic and extends beyond self-service infrastructure provisioning. You can this event-driven approach for any sort of event-driven IT automation, such as automatic remediation of security vulnerabilities, automatic scaling based on metrics and for building self-healing infrastructure.

Running Ansible Rulebook

Ansible Rulebook is easiest to run using a Decision Environment which is just a fancy name for a purpose-built Podman container image. You can run it as a systemd service to guarantee that Ansible Rulebook is always running. Overall the setup time is very low, even when compared to the commercial offerings. The caveat with ansible-rulebook is the same as with any non-commercial open source: it does not come with any commercial support. That may or may not be an issue for you.

PS. For those unfamiliar with Podman: it is nearly 1:1 compatible with Docker but, to keep it short, sucks much less.

Event Driven Ansible Controller

Event Driven Ansible Controller (or eda-server) is the open source upstream for the EDA component that comes with the commercial Ansible Automation Platform product. Event Driven Ansible Controller is essentially a web interface built around Ansible Rulebook.

You can deploy the Event Driven Ansible Controller on Kubernetes or Openshift. This may be a hindrance if you don't have a cluster running, but for starters you could use a single-node Kubernetes cluster such as k3s or one of the single-node OpenShift versions.

Ansible AWX

Ansible AWX is the open source upstream of Ansible Automation Platform. It does not have all the bells and whistles of AAP, but provides a convenient WebUI and an API around Ansible Core. What it lacks, by default, is the event-driven feature that AAP has. That feature is, in AAP, provided by the Event-Driven Ansible Controller (see above). If want to implement a self-service infrastructure provisioning then you don't benefit much from Ansible AWX. However, it very useful if you want to scale your Ansible-managed infrastructure and delegate responsibilities.

You can deploy Ansible AWX on Kubernetes or OpenShift.

Ansible Automation Platform

Ansible Automation Platform ("AAP") is a commercial product from Red Hat. It integrates a lot of open source projects into a coherent package, among the Ansible Rulebook. In the self-service context its most important feature is Event-Driven Ansible, which Ansible Rulebook enables (see above).

You can deploy Ansible Automation Platform two main ways:

  • RHEL 9 server: in this case AAP runs as a set of Podman containers. This is the easiest choice.
  • Kubernetes or OpenShift: if you're already using a container orchestration platform then deploying AAP there is a good option.

Terraform Cloud and Terraform Enterprise

Terraform Cloud is a SaaS offering that gives you a nice, feature-rich environment for running Terraform code. On-site instance of Terraform Cloud is called Terraform Enterprise.

What is no-code?

Hashicorp uses the term "no-code" to describe Terraform Cloud's self-service infrastructure provisioning capabilities. No-code allows people completely unfamiliar with infrastructure provisioning to, well, provision infrastructure. Your Terraform developers have to abstract away all the technical nitty-gritty details away so that people just need to do is to define a few variables. Terraform Cloud then passes those variables to the actual Terraform code that does the hard lifting.

Designed for technical teams

Hashicorp clearly designed Terraform Cloud for fairly large organizations which have multiple technical teams. Before no-code appeared infrastructure provisioning was triggered by code changes, not by a push of a button. This meant that non-technical people could not conveniently provision infrastructure. This original design choice still shows in no-code: anyone who wants to provision infrastructure with Terraform Cloud needs to be a part of a team. Moreover the user interface that enabled no-code provisioning is not particularly intuitive for the common man. As such, Terraform Cloud falls a bit short on the user experience front. It is also not possible (afaics) to have a manual approval process for no-code infrastructure provisioning requests.

Deploying Terraform Cloud or Terraform Enterprise

Now on to the deployment topic. Getting Terraform Cloud up and running is trivial as it is as SaaS offering. However, if you need or want an on-premise solution then you need to set up Terraform Enterprise. The only reasonable production deployment option is Kubernetes. If you don't already have a Kubernetes - or OpenShift - cluster then this may be a pill to swallow. And you probably should not run critical production workloads on top of a single node Kubernetes cluster like k3s. Apparently Hashicorp is able to provide an engineer to help with Terraform Enterprise setup.

Self-service infrastructure with Ansible and Terraform: which is better?

To answer the question above we need to understand the limitations of the underlying Infrastructure as Code tools first. So let's cover those now.

Pros and cons of Ansible-based approaches

Ansible is very good for generic IT automation. It can handle configuration management as well as orchestration. Physical and virtual machines, Linux and Windows, network devices, applications: not a problem for Ansible. What Ansible is not particularly good at is managing native Cloud resources. This is not an inherent limitation of Ansible, though. Rather, it is an effect of Red Hat's strategy of not putting its own development efforting into developing Ansible modules for managing Clouds such as AWS or Azure. If you need very basic stuff like virtual machines you will be ok.

If you are heavily invested in Cloud native resources then you're looking at either developing new Ansible modules or integrating Terraform or OpenTofu with your Ansible Playbooks. We can of course help you with both of those approaches, for a price.Pros and cons of Terraform Cloud

Pros and cons of Terraform Cloud

Terraform is very good at managing native Cloud resources. Its support for AWS and Azure (and almost any other Cloud) is excellent. However, Terraform is just a configuration management platform. It is not a generic automation platform. You cannot create automation workflows with it. You can just create and update infrastructure with it. Also, it does not natively support configuring the internals of physical or virtual machines. So, you will need to use something like Ansible to fill those boots.

Summary

In a nutshell self-service infrastructure is possible with both Ansible and Terraform, or their commercial versions. Ansible Automation Platform, Ansible AWX or Ansible Rulebook is a better solution if you want a generic solution that grows with your needs. If you've heavily invested in native Cloud resources and don't care about automating workflows then Terraform Cloud may work better for you.

Samuli Seppänen
Samuli Seppänen
Author archive
menucross-circle