Using Hiera with Ansible

June 7, 2024 
Using Hiera with Ansible is not easy, but using Hiera data is

Introduction

Hiera is a very powerful "key-value configuration data lookup system" used mainly, but not exclusively, with the Puppet configuration management and orchestration system. In Hiera data is stored in in yaml files in, as the name implies, a hierarchy or set of prioritized levels. A typical Hiera lookup goes through the hierarchy starting from the highest-priority level and stops when it finds a match. That match could come from a node yaml file, a deployment (e.g. production or staging) yaml file or from site-wide yaml. What this essentially means is that you can define common, site-wide parameters at the lowest level of the hierarchy and override those when needed at higher levels. How does Hiera and Ansible work together, then? The short answer is "they don't, really", but the long answer is more complex than that.

Using the community.general.hiera lookup function?

On paper Ansible can be integrated with Hiera using the community.general.hiera lookup function. However, the lookup function (code here) has not been updated in years. It also depends on the Hiera v3 project's which has similarly not been updated in years. Both codebases have been kept up to date so that they compile and run, but functionality-vise nothing has happened in ages. To make matters worse, Hiera v3 does not support hiera-eyaml encryption and uses its own v3 config file format which has not been seen in the wild in years.

Using Hiera data directly?

It seems that the least bad way to use Hiera data, for example from Puppet, is to just load the yaml data files directly as Ansible variable files and forget about the fancy hierarchies.

Hierarchy and data file contents

Suppose you have this simple hierarchy in a Hiera config (lowest priority levels are the bottom):

  1. nodes/%{trusted.certname}.yaml
  2. deployments/%{deployment}.yaml
  3. common.yaml

For this example we have four data files:

# common.yaml
common_var: 'common_var from common.yaml'

# deployments/staging.yaml
common_var: 'common_var from staging.yaml'

# deployments/production.yaml
common_var: 'common_var from production.yaml'

# nodes/mynode.yaml
common_var: 'common_var from mynode.yaml'
host_var: 'host_var from mynode.yaml'

Test Ansible playbook with "Hiera lookups"

To tie things up we need an Ansible playbook:

---
- name: Look up data from Hiera data files
  hosts: localhost
  become: false
  gather_facts: false
  vars_files:
    - data/common.yaml
    - data/deployments/{{ deployment }}.yaml
    - data/nodes/{{ trusted_certname }}.yaml
  tasks:
    - name: Look up common_var 
      ansible.builtin.debug:
        msg: "{{ common_var }}"
    - name: Look up host_var 
      ansible.builtin.debug:
        msg: "{{ host_var }}"

The order of vars_files actually provides hiera-esque functionality: variables found at lower levels are preferred. The two variables, deployment and trusted_certname need to defined or the playbook will not run.

Running the Ansible playbook, take 1

Let's run the Ansible playbook with variables set to trusted_certname=mynode and deployment=staging:

$ ansible-playbook -e trusted_certname=mynode -e deployment=staging playbooks/hiera.yml 

PLAY [Look up values from Hiera data files]
*******************************************

TASK [Look up common_var]
*************************
ok: [localhost] => {
    "msg": "common_var from mynode.yaml"
}

TASK [Look up host_var]
***********************
ok: [localhost] => {
    "msg": "host_var from mynode.yaml"
}

PLAY RECAP 
**********
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

The result were to be expected, as the highest-priority file included both common_var and host_var.

Running the Ansible playbook, take 2

To make things more interesting let's remove common_var from mynode.yml. The file should now look like this:

# nodes/mynode.yaml
host_var: 'host_var from mynode.yaml'

Now let's look up data with deployment set to staging:

$ ansible-playbook -e trusted_certname=mynode -e deployment=staging playbooks/hiera.yml 

PLAY [Look up values from Hiera data files]
*******************************************

TASK [Look up common_var]
*************************
ok: [localhost] => {
    "msg": "common_var from staging.yaml"
}

TASK [Look up host_var]
***********************
ok: [localhost] => {
    "msg": "host_var from mynode.yaml"
}

PLAY RECAP 
**********
localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Using facts to load correct variable files

Node-specific variables are fairly useless if you can't dynamically load the correct variable file. So, we have to adjust our playbook to load the appropriate Hiera data file like this:

  hosts:
    - webserver
    - db
  become: true
  gather_facts: true
  vars_files:
    - ../../data/common.yaml
    - ../../data/deployments/{{ deployment }}.yaml
    - ../../data/nodes/{{ ansible_facts['fqdn'] }}.yaml
  tasks:
    - name: Show value of my_var
      ansible.builtin.debug:
        var: my_var

The data files in this case would look like this:

# data/nodes/webserver.yaml
my_var: 'my_var from webserver'

# data/nodes/db.yaml
my_var: 'my_var from db'

When ansible-playbook runs on a host, it gathers facts first, then loads the appropriate variable files. This is fortunate for us, because it allows us to use reported facts for loading the correct data files. Here is an example run:

$ ansible-playbook -e deployment=staging playbooks/hiera.yml 

PLAY [Look up values from Hiera data files] 
*******************************************

TASK [Gathering Facts]
**********************
ok: [webserver]
ok: [db]

TASK [Show facts] 
*****************
ok: [webserver] => {
    "my_var": "my_var from webserver"
}
ok: [db] => {
    "my_var": "my_var from db"
}

PLAY RECAP
**********
webserver : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
db        : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0    

The deployment variable is still defined on the command-line, but it could be easily replaced with a custom fact.

Hiera with Ansible: the summary

The summary of this exercise is that you can use plain Hiera yaml data with Ansible. You can even get Ansible to behave in a Hiera-like fashion if you load the data files in correct order. Moreover, you can load Hiera data files based on Ansible facts. This gives you most of what is available in real Hiera, except fancy merge behaviors and support for hiera-eyaml.

Samuli Seppänen
Samuli Seppänen
Author archive
menucross-circle