Red Hat Open Tour 2022: Ansible automation project at Elering

September 17, 2022 
Red Hat Open Tour 2022 entry ID card

We participated in Red Hat Open Tour 2022 Tallinn a few weeks ago. Jaan Tanel Veikesaar from Elering, a gas/energy company in Estonia, gave a really nice presentation about their Ansible automation project. Ansible is a very common infrastructure as code and automation tool. Below I'll go over Jaan's presentation, adding some comments and key takeaways.

The starting point for the Ansible automation project

The starting point at Elering was fairly typical: to create a new VM, one had to go to VSphere, launch a VM and continue from there manually. In other words, automation did not really exist. This started to change when business started to demand more from the IT without adding any more human resources. This essentially made automation a necessity instead of being a luxury. So, they did some research and testing, after which Elering ended up with Ansible. They made this choice in part because whenever an Ansible problem was encountered, they could find an answer easily from the Internet.

The lifecycle of the Ansible automation project

In the early stages of Ansible automation project Elering focused on provisioning. This probably meant one-time creation of virtual machines with known-good configurations. Later they expanded automation to adding DNS entries for VMs, adding disks and other less critical tasks. The infrastructure currently managed by Ansible at Elering is fairly big, about 600 servers. With such a big infrastructure scaling is an issue with Ansible, as we discussed with Elering. At the moment they use Ansible automation primarily for provisioning. When they made big changes they targeted a subset of servers with VMWare tags.

Currently Elering's team enforces that the SSH keys of the VMs are correct at all time. In other words the scope of enforcing configuration management is not yet very wide. For quality assurance Elering uses of Visual Studio for Ansible syntax checking and linting among other things.

Manual configuration results in lots of wasted time

As Elering moved forward in their automation project, they started realizing how much wasted effort manually configuring things can create. Jaan gave one example of a manually created cluster that broke and took a week to debug and fix. The reason for the breakage was simple: a configuration mistake in one of the cluster members. With automation the team can track changes, including tracking who broke what and when: "See, you made this change which broke thing". I can personally add that with automation the time between making a mistake and realizing it is typically short: you realize you problem almost immediately after having made it. In my experience this is not the case with if you manage infrastructure manually.

People were the key to success

The Ansible automation project's success did not owe itself to just automation. The people were the key. Somebody had to take the lead, of course, but the real success grew from the Ansible/system administration team helping other people solve their problems with Ansible automation. That motivated more people to commit themselves to the project. This people factor is very important because a project is likely to fail if it does not scratch and itch for each participant.

Learning curve and how to overcome it

Another people factor they encountered was the "learning curve". People accustomed to Windows were not particularly keen in learning Unix-y command-line required to use plain Ansible. They solved this problem with Ansible Tower, which allowed the Windows people to just use a GUI to do what they need. Especially at the early stages of a project when the value of automation is not yet clear, demanding them to climb a steep learning curve is probably too much.

Key takeaways

The key takeways are:

  • Automation does not have to be complete in order to be highly useful. We at Puppeteers always aim for 100% configuration management, life cycle management and quality assurance processes. That said, automation is a process that takes time. Moreover, 100% automation can in many cases be very time consuming with very low return on investment.
  • You should always take the people factor into account at all times. People are likely to take part in a project if they feel that they will get something out of it. This is particularly true if you can support people on their automation journey.
Samuli Seppänen
Samuli Seppänen
Author archive