Contact us
There is a myriad of ways to login to Linux systems using Azure AD credentials. Photo credit: https://pixabay.com/vectors/bash-command-line-linux-shell-148836.
There are a myriad of ways to login to Linux systems using Azure AD credentials. Photo credit.

Introduction

There are several ways to do Linux Azure AD authentication. In other words you can log in to your Linux hosts Azure Active Directory ("Azure AD") credentials in various ways. Azure, Microsoft's public Cloud, builds on top of Azure AD. In fact, your Azure users, groups, roles and role assignments are stored in Azure AD.

The challenge with Linux Azure AD authentication is that Azure AD does not support "legacy protocols", LDAP and Kerberos. These protocols are used to allow Linux logins using centralized identities. It is also important to distinguish between Azure AD and classic Active Directory ("AD DS"). While their names are similar, they are completely different beasts. That said, Azure AD and Active Directory can be integrated together with Azure AD Connect. For more on this confusing terminology have a look at our earlier Windows domain in Azure blog post.

In the open source world the closest analogy to Azure AD is probably Keycloak on which Red Hat's commercially supported Red Hat Single Sign-On is based. Keycloak is an open source identity and access management application. While we know and love Keycloak, it is impossible to avoid Azure AD due to its huge marketshare.

This article tries to outline the options you have for logging in to your Linux hosts with Azure AD credentials. The Linux hosts can be located in Azure or elsewhere, depending on the authentication method.

LDAP authentication via Active Directory connected to Azure AD

As I mentioned above, Azure AD can be connect to classic Active Directory with Azure AD Connect. This allows you to join your Linux VMs to Active Directory using LDAP and Kerberos. So, you essentially circumvent native Azure AD authentication. That said, this approach only makes sense if you don't already have an Active Directory instance. If not, the maintenance overhead is probably too much. That said, if you do have Active Directory integrated with Azure AD, you can throw Red Hat IdM/FreeIPA in the mix. This will get you the best of all worlds at the cost of fairly high level of complexity.

LDAP authentication via Azure AD Domain Services

Azure AD can support LDAP and Kerberos with help from Azure AD Domain Services ("AAD DS"). AAD DS is a managed service that has nothing to do with classic Active Directory. With AAD DS you get the LDAP and Kerberos endpoints. Those then allow you to join Linux VMs indirectly to the Azure AD domain. Microsoft official supports this configuration: see the instructions for Red Hat and other Linux distributions here.

Azure AD authentication via OpenSSH

Azure AD has built-in support for logging in to Linux VMs using Azure AD authentication via OpenSSH. This approach has a number of downside and inconveniences:

This said, if all your Linux instances are in Azure this approach might be fairly reasonable and non-intrusive.

Azure Active Directory Authentication for Ubuntu

Canonical, the author of the popular Ubuntu distribution has developed native Azure AD authentication support for Ubuntu Desktop. It has three components:

In order to use this authentication method you also need an application and service principal in Azure. The modules also support offline logins, which is a really nice feature.

The caveat with this approach is that it seems desktop centric. The use-case is essentially the same as when domain-joining Windows 10+ systems to Azure AD: your local Ubuntu desktop logins are authenticated from Azure AD instead of the local files. This means that using this approach for headless server logins may be challenging or impossible.

The source code for these components is available on GitHub. While the developers focus on Ubuntu, there is nothing particularly Ubuntu-specific about the code. The software they build on top of, namely PAM and NSS, is available in every Linux distribution. Therefore it is likely that the code will work just fine on distributions such as Red Hat Linux. You might encounter issues with old versions of PAM and NSS, though, but we have not tested that yet.

AAD for Linux

AAD for Linux is a GitHub organization that seems focused on Linux Azure AD support. There are two relevant components that are very similar to what Azure Active Directory for Ubuntu includes:

You also need a OAuth2 client in Azure (i.e. an application) in Azure to make use of this login method.

When logging in you get a one-time token in email. You then pass the token to a special Azure authentication URL. After that you can login by just pressing enter. This process is not particularly user-friendly, but might be enough of your needs.

What about Red Hat IdM/FreeIPA?

Red Hat Identity Management and it's upstream, FreeIPA, allow you to create Linux domains. Essentially they are to Linux what classic Active Directory is for Windows. You can also create a so-called "trust relationship" between classic Active Directory and FreeIPA. That works because both of those applications use "legacy protocols" such as LDAP and Kerberos. However, Azure AD is based on OAuth2, OIDC and SAML 2.0 protocols.

The protocol incompatibility makes it impossible to link Red Hat IdM/FreeIPA directly with Azure AD. It is highly unlikely that such linking will ever be possible. However, there is an indirect way:

  1. Connect Azure AD to classic Active Directory with Azure AD Connect
  2. Connect FreeIPA to the classic Active Directory

This is the optimal solution, but also a fairly complex one, suitable mostly for larger organizations, that have the required expertise and automation capabilities available.

That said, if you have a large fleet of Linux instances you should seriously consider using Red Hat IdM/FreeIPA. It's a proven solution in global scale cloud computing. With it you get lots essential features for managing identities, authentication and authorization on Linux systems. It provides you with a wealth of useful features including, but not limited to:

Other possible ways of integrating Linux systems with AAD via OAuth2 that might be of interest to the solution seekers are:

I hope you found this article informative. If you know other ways to login to Linux systems with Azure AD credentials please let us know.

With recent versions of Keycloak you have to redeploy your Javascript authorization policies as JAR archives. Photo credit: https://www.pexels.com/it-it/foto/tipografia-trasparente-lettera-nero-5149241/
With recent versions of Keycloak you have to redeploy your Javascript authorization policies as JAR archives. Photo credit.

Introduction

This article tries to include everything you need to know about Keycloak Javascript policy deployment. By Javascript policies I mean authorization policies of Javascript type attached to Keycloak clients that have authorization enabled. If you don't know what Keycloak authorization services are, you are probably in the wrong place.

While the process of creating and deploying the JAR files is pretty well documented, it is not altogether clear for the uninitiated how the policies are shown in Keycloak.

Some history first

In Keycloak versions prior to 7.0.1 you could create Javascript-based authorization policies using the Admin Console for your Keycloak clients. That of course required that the client had authorization enabled. In subsequent Keycloak versions you had to enable this feature separately. There were two ways to do this:

The script upload feature which you could use to create Javascript policies in the Admin Console was later removed from Keycloak for security reasons. The new and better way to deploy Javascript policies is to distribute them to Keycloak as JAR files.

Keycloak Javascript policy deployment with a JAR file

You can create JAR files with Javascript policies easily. You need two things:

  1. The Javascript file(s) you want to use as policies
  2. A metadata file

Start by creating a directory with the following layout:

.
├── META-INF
│   └── keycloak-scripts.json
└── policy.js

The policy.js is the actual policy you want to deploy. The filename has no special meaning, so you could as well use foobar.js if you wanted. Here is a trivial example:

$evaluation.grant();

The keycloak-scripts.json file contains metadata related to the policy:

{
    "policies": [
        {
            "name": "MyPolicy",
            "fileName": "policy.js",
            "description": "My Policy"
        }
    ]
}

Multiple policies can be present in a single JAR file. You can also distribute mappers and authenticators alongside with policies, but that's outside of the scope of this article.

You can create a JAR file with either "jar" or "zip". On Linux you would do the following:

$ zip -r policy.jar META-INF/ policy.js 
  adding: META-INF/ (stored 0%)
  adding: META-INF/keycloak-scripts.json (deflated 39%)
  adding: policy.js (stored 0%)

The archive contents should look like this:

$ zipinfo policy.jar 
Archive:  policy.jar
Zip file size: 665 bytes, number of entries: 3
drwxr-xr-x  3.0 unx        0 bx stor 23-Apr-18 10:52 META-INF/
-rw-r--r--  3.0 unx      208 tx defN 23-Apr-18 10:52 META-INF/keycloak-scripts.json
-rwxr-xr-x  3.0 unx       21 tx stor 23-Apr-18 10:52 policy.js
3 files, 229 bytes uncompressed, 147 bytes compressed:  35.8%

So, the Javascript file, policy.js, at the root level and keycloak-scripts.json under META-INF.

Deploying the JAR file

If you are using non-Quarkus versions of Keycloak you can just drop the JAR file to the auto-deployment directory (e.g. /opt/keycloak/standalone/deployments) and should get deployed. On Quarkus versions of Keycloak you'd put the JAR file into $KEYCLOAK_HOME/providers and run kc.sh build.

Checking if the deployment was successful

Typically you can check systemd unit status or Keycloak logs to see if the deployment was successful. The best way, though, is to use the "Server Info" page:

  1. Login as Keycloak admin user
  2. Click on the user menu
  3. Select "Server Info"
  4. Click on the "Provider" tab
  5. Navigate to the "Policy" section
  6. Check if your policy (e.g. "script-policy.js") is present

The final and conclusive test is to attach the policy to a Keycloak client which has authorization enabled.

How to attach the policies to a Keycloak client

This is the part that nobody really bothered documenting so it took me off guard. The authorization policies you could upload from the Admin Console were already of the "Javascript type". Policies that are deployed using JAR files actually become their own policy type. For example, given the JAR file above you should find a new policy of type "MyPolicy" under the Keycloak client's authorization policies dropdown menu. You can attach the same policy to the same client multiple times if you want, as long as you use a different policy name.

Understanding Openshift versions will take practice - and a divine intervention won't hurt, either. Photo credit: https://www.pexels.com/it-it/foto/persone-chiesa-cantando-cantanti-7569424/
Internalizing Openshift versions takes lots of practice, but divine intervention would not hurt, either. Photo credit.

Introduction

Red Hat Openshift is essentially an opinionated Kubernetes distribution that comes with a large number of features such as CI/CD and container registry built-in; for a full list of differences look at the Red Hat OpenShift vs. Kubernetes. Openshift comes in a number of versions, some commercial and some open source. The naming history of Openshift is particularly confusing. That's why I wrote this blog post where I attempt to list most Openshift versions and variants and outline their main differences.

Openshift Container Platform

Openshift Container Platform is the core commercial offering of Red Hat Openshift. As other Red Hat product it is built on top of Red Hat Enterprise Linux or RHEL. The Openshift Container Platform Plus is an enhanced version that includes additional tools. Those tools are particularly useful in big Openshift deployment. An Openshift Container Platform in the highly-available setup typically includes nine nodes. Three nodes are workers, three are control nodes and three are infrastructure nodes. The workers run the actual containerized workloads. The control nodes do the orchestration. The infrastructure nodes provide additional services like CI/CD and the container registry. In general you would set up Openshift Container Platform in the same places as a traditional Kubernetes cluster.

The subscription price of self-hosted Openshift Container Platform does not seem to available publicly. You can ask Red Hat for a quota. Alternatively yo ucan guess by checking subscription prices for ROSA, for example. Overall Openshift Container Platform is more expensive to deploy that Kubernetes. However, given all the things it is bundled with Openshift Container Platform is probably compelling to many customers.

Red Hat markets a single-node Openshift as Openshift Edge or to be more precise, Openshift at the Edge. Openshift Edge is just Openshift trimmed down to run on a single node without huge overhead. As the name implies, Openshift Edge is mean for having Openshift running at the edge. An example is a site where the application running in Openshift requires low latency. In that case reaching out to a centralized Openshift cluster is not possible. You can install Openshift (at the) Edge with the normal Openshift installer as described here.

In our opinion edge computing is a pretty narrow use-case for Openshift Edge. Another reasonable use-case is running workloads that you'd traditionally run in standalone virtual machines, such as simple web applications. It is unlikely to perform worse than, say, docker-compose or plain docker force-fitted into the production use-case. You can install Openshift Edge with the normal Openshift assisted installer.

Red Hat Openshift Service on AWS

Red Hat Openshift Service on AWS or "ROSA" for short, is the official, managed Openshift Container Platform version on AWS. ROSA is very pricey, so forget about it unless you have real need for it. If you don't need it yet run it, you should enjoy burning money as your pastime. That said, any form of Kubernetes gets quite expensive to run. However, much of the cost comes from the very high infrastructure prices in big public Cloud. To make matters worse Kubernetes and in particular Openshift Container Platform require tons of servers to even run.

All this said ROSA is definitely not that much more expensive given its feature set. ROSA includes support from both Red Hat and AWS.

OKD

OKD is the community version of Openshift. It runs on top of CentOS Linux distribution. It has most of the features of Openshift Container Platform: what is lacking is primarily support and certification. Also, OKD builds on top of CentOS. The CentOS operating system is only community-supported and changes more rapidly than in RHEL. That said, it seems possible to run OKD on more stable community distributions like Rocky Linux 8. There is no official edge version of OKD, but you can create one yourself with some effort.

Openshift Local

Openshift Local is the currently kosher way to run Openshift 4.x on a developer workstation. Think of it as Minikube, but for Openshift instead of vanilla Kubernetes. It is a successor to Codeready Containers (see below).

Codeready Containers

Codeready Containers is an older way to run Openshift 4.x on a developer workstation. Somewhat surprisingly Red Hat has the same name for the upstream open source project. Red Hat discontinued Codeready containers and created Openshift Local as its successor.

Minishift

Minishift is a community project that allows running running Openshift 3.x on a developer workstation. You should not be using Minishift unless you need it to support existing Openshift 3.x installations.

Openshift Origin

In the distant past Red Hat called the community version of Openshift Openshift Origin. At some point Red Hat renamed the community project as OKD and that name has stuck, so far. You can still see the name "origin" in the repositories of Openshift organization on GitHub, for example in origin-server. It looks like Red Hat stopped using the name "Origin" when it transitioned from Openshift 2 to Openshift 3.

Openshift versions: our advise

When you look at Openshift versions take particular care in the supported Openshift version: is it Openshift 2.x, 3.x or 4.x. You should keep any versions prior to 4.x at an arms length. Then you can only pray that Red Hat does not make any more breaking changes or worse, invent new names for Openshift to confuse you.

If you don't want or can't pay subscription fees to Red Hat you should go with the open source OKD. If you need support, certification or additional features then prepare to shell out money to Red Hat. It is probably worth it in the end.

If you don't want to manage the Openshift cluster yourself, go with a hosted solution like AWS ROSA. If you are not afraid of getting your hands dirty you can just as well spin up your Openshift cluster in a el cheapo Cloud like Hetzner Cloud and save a lot of money (and lose a lot of time).

And folks: remember to use infrastructure as code tools to manage your Openshift clusters.

The best documentation for Keycloak Authorization Services REST API was tcpdump. This is no longer the case, fortunately.
The best documentation for Keycloak Authorization Services REST API was tcpdump. This is no longer the case, fortunately.

Introduction

The Keycloak Authorization Services allows you to offload your application's authorization decisions to Keycloak instead of implementing them in your code. This way you can leverage Keycloak's advanced features like 2FA without any additional development on your part. If you're unfamiliar with the Authorization Services I suggest having a look at the Keycloak authorization services terminology first. You may also be interested in our sample code for managing Keycloak authorization srevices programmatically. To be perfectly honest, there is no separate Keycloak authorization services REST API. While the authorization services do support the UMA Protection API, that API is only needed if you want your resource servers (e.g. webapps) to be able manage their own authorization services configuration inside Keycloak. For example, you might want to provide your webapp users the capability to grant access (e.g. read, write) to their own resources to other users. In this article we will not cover the Protection API at all.

This article shows how to manipulate authorization data for a Keycloak client using the Keycloak Admin REST API. This use-case is particularly "lightly documented", meaning that there used to be zero documentation about this topic. Unless, of course, you count use of tcpdump and browser developer tools as documentation.

Generic advise on the JSON payload format

In the JSON payloads mandary parameters are marked with bold. These are typically parameters that are reqiured to identify the object. Some other parameters may also be mandatory, for example those that define the type of the object. Values for mandatory parameters are surrounded by < and > signs to make them stand out, for example <policy-name>. The < and > signs should be removed from real JSON payloads.

Some parameters allow a fixed set of valid values. The valid values are separated with pipe signs ("|"). For example:

"param": "foo|bar|faz"

Where foo, bar and faz are all valid values.

Some parameters seem redundant, such as the logic parameter that in many cases has only one valid value ("positive"). Similarly the decisionStrategy parameter often has only one valid value ("UNANIMOUS"). If one of these parameters is missing the JSON payload might get rejected (or not). The same may happen if the parameter has an invalid value. It is also possible that values of these essentially useless parameters are just ignored.

Certain parameters in the JSON payload are optional. This means that they are not required to be present in the payload for it to be considered valid. The icon_uri field is a good example. You can add this to the JSON payload whenever the object in question support that field:

"icon_uri":"<some-uri>"

Another optional field is description which can be, but does not have to be, in the JSON payload:

"description":"<some-description>"

Managing Resources

Check if a Resource exists

Search for resources with matching name:

GET /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/resource/search?name=<resource-name>

Create a Resource

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/resource

JSON payload:

{
  "name":"<resource-name>",
  "displayName":"<resource-display-name>",
  "scopes":[],
  "attributes":{},
  "uris" :[],
  "ownerManagedAccess":""
}

Modify a resource

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/resource/<resource-id>

JSON payload:

{
  "_id":"<resource-id>",
  "name":"<resource-name>",
  "owner":
    {
      "id":"<client-id>",
      "name":"<client-name>"
    },
  "ownerManagedAccess":false,
  "displayName":"",
  "attributes":{},
  "uris":[],
  "scopes":[]
}

Linking resources with authorization scopes

To link a resource with one or more authorization scope add data to the scope parameter:

"scopes":[
  {
    "id":"<authorization-scope-id>",
    "name":"<authorization-scope-name>"
  }
]

Linking resources with types

A resource can belong to a type (see terminology). To make a resource of a certain type just define the type as a string:

"type":"<some-string>"

Managing attributes

You can add attributes to the JSON payload like this:

"attributes":
  {
    "param1":["foo"],
    "param2":["bar"]
  }

Delete a resource

HTTP method and path:

DELETE /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/resource/<resource-id>

Managing Authorization Scopes

Check if an Authorization Scope exists

Search for an authorization scope by name:

GET /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/scope/search?name=<scope-name>

Create an Authorization Ccope

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/scope

JSON payload:

{
  "name":"<authorization-scope-name>",
  "displayName":""
}

Modify an Authorization Scope

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/scope/<scope-id>

JSON payload:

{
  "id":"<authorization-scope-id>",
  "name":"<authorization-scope-name>",
  "displayName":""
}

Delete an Authorization Scope

HTTP method and path:

DELETE /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/scope/<authorization-scope-id>

Managing Permissions

Check if a Permission exists

Search for Scope-based Permission by name:

GET /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/search?name=<permission-name>

NOTE: the policy part of the path shown above is not a typo.

Create a Scope Permission

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/permission/scope

JSON payload:

{
  "name":"<permission-name>",
  "scopes":["<scope-id>"],
  "decisionStrategy":"UNANIMOUS|AFFIRMATIVE|CONSENSUS",
  "type":"scope",
  "logic":"POSITIVE",
  "description":"",
  "resources":["<resource-id>"],
  "policies":["<policy-id>"]
}

Modify a Scope Permission

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/permission/scope/<permission-id>

JSON payload:

{
  "id":"<permission-id>",
  "name":"<permission-name>",
  "scopes":["<scope-id>"],
  "decisionStrategy":"UNANIMOUS|AFFIRMATIVE|CONSENSUS",
  "type":"scope",
  "logic":"POSITIVE",
  "description":"",
  "resources":["<resource-id>"],
  "policies":["<policy-id>"],
  "description":""
}

Create a Resource Permission

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/permission/resource

JSON payload:

{
  "type":"resource",
  "logic":"POSITIVE",
  "decisionStrategy":"UNANIMOUS|AFFIRMATIVE|CONSENSUS",
  "name":"<resource-based-permission-name>",
  "resources":["<resource-id>"],
  "policies":["<policy-id>"],
  "description":""
}

Modify a Resource Permission

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/permission/resource/<permission-id>

JSON payload:

{
  "id":"<resource-permission-id>",
  "name":"<resource-permission-name>",
  "type":"resource",
  "logic":"POSITIVE",
  "decisionStrategy":"UNANIMOUS|AFFIRMATIVE|CONSENSUS",
  "resources":["<resource-id>"],
  "policies":["<policy-id>"],
  "description":""
}

Delete a permission

HTTP method and path:

DELETE /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/<permission-id>

NOTE: the policy/<permission-id> part of the path shown above is not a typo.

Managing policies

Check if a policy exists

HTTP method and path:

GET /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/search?name=<policy-name>

Delete a policy

HTTP method and path:

DELETE /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/<policy-id>

Create a regex policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/regex

JSON payload:

{
  "type":"regex",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<policy-name>",
  "pattern":"<regular-expression>",
  "targetClaim":"<claim-name>",
  "description":""
}

Modify a regex policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/regex/<policy-id>

JSON payload:

{
  "id":"<policy-id>",
  "name":"<policy-name>",
  "type":"regex",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "targetClaim":"<claim-name>",
  "pattern":"<regular-expression>",
  "description":""
}

Create a role policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/role

JSON payload:

{
  "type":"role",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<policy-name>",
  "roles":[
    {"id":"<role-id>"}
  ],
  "description":""
}

The roles in the JSON payload can be either realm roles or client roles.

Modify a role policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/role/<policy-id> 

JSON payload:

{
  "id":"<policy-id>",
  "name":"<policy-name>",
  "type":"role",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "roles":[
    {"id":"<role-id>"}
  ],
  "description":""
}

Create a Client Policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/client

JSON payload:

{
  "type":"client",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<client-policy-name>",
  "clients":["<client-id>"],
  "description":""
}

Modify a Client Policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/client/<client-policy-id>

JSON payload:

{
  "id":"<client-policy-id>",
  "name":"<client-policy-name>",
  "type":"client",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "clients":["<client-id>"],
  "description":""
}

Create a Time Policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/time

Minimal JSON payload:

{
  "type":"time",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<time-policy-name>",
  "description":"",
   <time-settings>
}

Replace <time-settings> with a date range (e.g. month and monthEnd) or a time threshold (notAfter or notBefore):

"notAfter":"1970-01-01 00:00:00"
"notBefore":"1970-01-01 00:00:00"
"dayMonth":<day-of-month>
"dayMonthEnd":<day-of-month>
"month":<month>
"monthEnd":<month>
"year":<year>
"yearEnd":<year>
"hour":<hour>
"hourEnd":<hour>
"minute":<minute>
"minuteEnd":<minute>

Modify a Time Policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/time/<time-policy-id>

JSON payload:

{
  "id":"<time-policy-id>",
  "name":"<time-policy-name>",
  "type":"time",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "description":"",
  <time-settings>
}

Replace <time-settings> in the same way as when creating a Time Policy.

Creating a User Policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/user

JSON payload:

{
  "type":"user",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<user-policy-name>",
  "users":["<user-id>"],
  "description":""
}

Modify User Policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/user/<user-policy-id>

JSON payload:

{
  "id":"<user-policy-id>",
  "name":"<user-policy-name>",
  "type":"user",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "users":["<user-id>"],
  "description":""
}

Create a Client Scope Policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/client-scope

JSON payload:

{
  "type":"client-scope",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<client-scope-policy-name>",
  "clientScopes":[
    {"id":"<client-scope-id>"}
  ],
  "description": ""
}

Modify a Client Scope Policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/client-scope/<client-scope-policy-id>

JSON payload:

{
  "id":"<client-scope-policy-id>",
  "name":"<client-scope-policy-name>",
  "type":"client-scope",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "clientScopes":[
    {"id":"<client-scope-id>"}
  ],
  "description":""
}

Create an Aggregated Policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/aggregate

JSON payload:

{
  "type":"aggregate",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS|AFFIRMATIVE|CONSENSUS",
  "name":"<aggregated-policy-name>",
  "policies":["<policy-id>"],
  "description":""
}

Modify an Aggregated Policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/aggregate/<aggregate-policy-id>

JSON payload:

{
  "id":"<aggregated-policy-id>",
  "name":"<aggregated-policy-name>",
  "type":"aggregate",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS|AFFIRMATIVE|CONSENSUS",
  "policies":["<policy-id>"]
  "description":"",
}

Create a Group Policy

HTTP method and path:

POST /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/group

JSON payload:

{
  "type":"group",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "name":"<group-policy-name>",
  "groups":[
    {"id":"<group-id>","path":"<group-path>"}
  ]
  "groupsClaim":"<groups-claim>",
  "description":""
}

Modify a Group Policy

HTTP method and path:

PUT /auth/admin/realms/<realm>/clients/<client-id>/authz/resource-server/policy/group/<group-policy-id>

JSON payload:

{
  "id":"<group-policy-id>",
  "name":"<group-policy-name>",
  "type":"group",
  "logic":"POSITIVE|NEGATIVE",
  "decisionStrategy":"UNANIMOUS",
  "groups":[
    {"id":"<group-id>",
     "extendChildren":false,
     "path":"<group-path>"
    }
  ],
  "groupsClaim":"<groups-claim>",
  "description":""
}

What about Javascript policies?

Managing Javascript policies using the Admin UI has been deprecated since Keycloak 7.0.1. It is possible to re-enable the feature but I believe it went away for good in first Quarkus versions of Keycloak. So, there is little point in documenting how Javascript policies could be added in older Keycloak versions. It is better to deploy those policies as JAR files anyways.

There are many ways to debug Keycloak OIDC token exchange. In this article we go through some of them. Photo credit: https://www.pexels.com/it-it/foto/tastiera-del-macbook-532173/.
There are many ways to debug Keycloak OIDC token exchange. In this article we go through some of them. Photo credit.

Introduction

There's no substitute for understanding what you're doing, and that in turn is difficult without seeing what is happening. Debugging Keycloak OIDC problems without understanding what is happening under the hood is no exception to this rule. The purpose of this article is twofold:

  1. Help you learn how Keycloak OIDC token exchange work
  2. Provide you with the tools to debug Keycloak OIDC token exchanges in production environments

For the purposes of this blog post I've been using the OpenID Connect Playground application from the book Keycloak - Identity and Access Management for Modern Applications from Packt Publishing. I can recommend that book for anyone who needs to understand Keycloak and in particular the protocols it supports (OAuth2, OIDC and SAML 2.0).

Debugging Keycloak OIDC token exchange with tcpdump

Doing the packet capture

If traffic between Keycloak and the client (e.g. a web application) is not encrypted, debugging Keycloak OIDC token exchanges is easy to do with tcpdump. Here's a sample tcpdump command-line that ran on the computer running the web browser:

$ tcpdump -i vboxnet1 port 8080 -w /tmp/keycloak.pcap

The "-i" option defines the network interface to capture traffic on. Once you have tcpdump listening do whatever is failing and then read the packet capture in ASCII mode:

$ tcpdump -A -r /tmp/keycloak.pcap

To get all possible data you need to run tcpdump at both sides - browser and Keycloak.

Example Authorization Code flow exchange between browser and Keycloak

Below you'll see the exchange between the application and Keycloak that use Authorization Code flow, which is essentially the same as Authorization Code grant type in OAuth2. Her the Keycloak server lives at http://192.168.56.80:8080. I recorded the flow below from the computer running the browser. To get the full token exchange you also have to record from the Keycloak side.

Here the browser asks Keycloak realm's OpenID Connect authorization endpoint (/auth/realms/master/protocol/openid-connect/auth) for an authorization code:

13:03:16.666614 IP laptop.56044 > keycloak.webcache: Flags [P.], seq 1:1338, ack 1, win 502, options [nop,nop,TS val 3674982084 ecr 1686713802], length 1337: HTTP: GET /auth/realms/master/protocol/openid-connect/auth?client_id=oidc-playground&response_type=code&redirect_uri=http://localhost:8000/&scope=openid&prompt=login&max_age=3600&login_hint=oidc-playground HTTP/1.1

The interesting parameters are:

After successful authentication Keycloak sends the browser an authorization code (see code below):

13:03:20.136822 IP keycloak.webcache > laptop.56046: Flags [P.], seq 82239:85248, ack 1700, win 501, options [nop,nop,TS val 1686717272 ecr 3674985450], length 3009: HTTP: HTTP/1.1 302 Found
--- snip ---
Location: http://localhost:8000/?session_state=f837974a-f26b-464a-8a2c-d70f0886c7a1&code=84334b87-fa5d-46e2-b43c-53f483b33db7.f837974a-f26b-464a-8a2c-d70f0886c7a1.6f46e3e3-5421-4439-995a-dfa6e7710141
--- snip ---

The laptop then sends the authorization code to the realm's token endpoint. Note how the code is the same as above (84334b87...):

13:03:24.644215 IP laptop.56042 > keycloak.webcache: Flags [P.], seq 379:1032, ack 6170, win 501, options [nop,nop,TS val 3674990062 ecr 1686717952], length 653: HTTP: POST /auth/realms/master/protocol/openid-connect/token HTTP/1.1
--- snip ---
grant_type=authorization_code&code=84334b87-fa5d-46e2-b43c-53f483b33db7.f837974a-f26b-464a-8a2c-d70f0886c7a1.6f46e3e3-5421-4439-995a-dfa6e7710141&client_id=oidc-playground&redirect_uri=http://localhost:8000/

Keycloak sends an access token, ID token and refresh token back to the browser:

13:03:24.655139 IP keycloak.webcache > laptop.56042: Flags [P.], seq 6170:10045, ack 1032, win 502, options [nop,nop,TS val 1686721790 ecr 3674990062], length 3875: HTTP: HTTP/1.1 200 OK
--- snip ---
{
  "access_token":"<long string>",
  "expires_in":60,
  "refresh_expires_in":1800,
  "refresh_token":"<long string>",
  "token_type":"Bearer",
  "id_token":"<long string>",
  "not-before-policy":0,
  "session_state":"f837974a-f26b-464a-8a2c-d70f0886c7a1",
  "scope":"openid email profile"
}

All the tokens are JSON Web Tokens (JWT) and consist of three dot-separated parts. The first part is a base64-encoded header and the second part is the base64-encoded payload. The third part is the signature. You can simply copy the strings from tcpdump, split at "." and base64 decode the segments to get the JSON-formatted data, as shown below. In JWT tokens all times are in the Unix epoch time. RFC 7519 documents the fields present in these tokens so I won't go through them here.

The access token

The access token contains, among other things, the level of access Keycloak granted the user. The header is very boring:

$ echo "<access-token>"|cut -d "." -f 1|base64 -d
{
  "alg":"RS256",
  "typ" : "JWT",
  "kid" : "pFa6nY7UzYpsxBnNNve8yzwu1LpxdcAmMiqIusSThKg"
}

The payload is the more interesting part:

$ echo "<access-token>"|cut -d "." -f 2|base64 -d
{
  "exp":1679483071,
  "iat":1679483011,
  "auth_time":1679483000,
  "jti":"34b23679-d670-4002-b35c-71071b0ab219",
  "iss":"http://192.168.56.80:8080/auth/realms/master",
  "aud":"account",
  "sub":"1356f09b-dd9c-4f72-aeb5-c86351478c15",
  "typ":"Bearer",
  "azp":"oidc-playground",
  "session_state":"f837974a-f26b-464a-8a2c-d70f0886c7a1",
  "acr":"1",
  "allowed-origins":["http://localhost:8000"],
  "realm_access":{"roles":["default-roles-master","offline_access","uma_authorization"]},
  "resource_access":{"account":{"roles":["manage-account","manage-account-links","view-profile"]}},
  "scope":"openid email profile",
  "sid":"f837974a-f26b-464a-8a2c-d70f0886c7a1",
  "email_verified":false,
  "preferred_username":"oidc-playground"
}

The ID token

As with access tokens the header is very boring:

$ echo <id-token>|cut -d "." -f 1|base64 -d
{
  "alg":"RS256",
  "typ" : "JWT",
  "kid" : "pFa6nY7UzYpsxBnNNve8yzwu1LpxdcAmMiqIusSThKg"
}

The payload:

$ echo <id-token>|cut -d "." -f 2|base64 -d
{
  "exp":1679483064,
  "iat":1679483004,
  "auth_time":1679483000,
  "jti":"80ff10a3-9f3b-4f44-bb57-d5612699e40a",
  "iss":"http://192.168.56.80:8080/auth/realms/master",
  "aud":"oidc-playground",
  "sub":"1356f09b-dd9c-4f72-aeb5-c86351478c15",
  "typ":"ID",
  "azp":"oidc-playground",
  "session_state":"f837974a-f26b-464a-8a2c-d70f0886c7a1",
  "at_hash":"KSQeRU72hKCui4V5b1Pnew",
  "acr":"1",
  "sid":"f837974a-f26b-464a-8a2c-d70f0886c7a1",
  "email_verified":false,
  "preferred_username":"oidc-playground"
}

Much of the data in the ID token is derived from the Keycloak user you authenticated as. For example, if your user has an email address, first name and last name, you will see additional fields like these:

  "name": "Foo Bar",
  "given_name": "Foo",
  "family_name": "Bar",
  "email": "[email protected]",

The refresh token

The header:

$ echo <refresh-token>|cut -d "." -f 1|base64 -d
{
  "alg":"HS256",
  "typ" : "JWT",
  "kid" : "cd7e1f08-7d77-44d8-bb8e-33353afe666f"
}

The payload:

{
  "exp":1679484811,
  "iat":1679483011,
  "jti":"e506bb8f-379a-4c9a-affd-fa5407a7263f",
  "iss":"http://192.168.56.80:8080/auth/realms/master",
  "aud":"http://192.168.56.80:8080/auth/realms/master",
  "sub":"1356f09b-dd9c-4f72-aeb5-c86351478c15",
  "typ":"Refresh",
  "azp":"oidc-playground",
  "session_state":"f837974a-f26b-464a-8a2c-d70f0886c7a1",
  "scope":"openid email profile",
  "sid":"f837974a-f26b-464a-8a2c-d70f0886c7a1"
}

Debugging Keycloak OIDC token exchange with OAuth 2.0 tracer in Google Chrome

If you're only interested in what the browser sees you can use the SAML, WS-Federation and OAuth 2.0 tracer extension to Google Chrome. I suggest you to turn on verbosity to the maximum in options. Once you start recording debugging Keycloak OIDC token exchange is just a matter of checking the messages in the traces. That is way easier than parsing the raw tcpdump data. Also, if traffic between Keycloak and browser is end-to-end encrypted with HTTPS then tcpdump might not be an option at all.

Debugging Keycloak OIDC token exchange with Firefox DevTools

Firefox DevTools, which are built-in in to Firefox, allow you to view HTTP requests and payloads. Press Ctrl-Shift-E to go straight to the Network section where you can analyze what the browser sees during token exchanges.

Evaluating client scopes

Keycloak provides tools for evaluating the tokens granted for clients (e.g. web applications). Go to the Keycloak client of your application, then select "Client Scopes", select a "User" to impersonate and click on "Evaluate". Several interesting tabs now appear:

As you can see, you can debug token contents directly from Keycloak, without you having to trigger a real OpenID Connect token exchange and debug the tokens that way. Similarly, you can check if there are any inconsistencies with the generated ID token and User Info, should your application prefer getting its user information from one over the other.

Learning with OpenID Connect Playground

OpenID Connect Playground is a very simple Node.js application with the sole purpose of making the OIDC token exchange visible. In OpenID Connect terms it is a Relying Party (RP) that uses the Authorization Code flow. It is a quite useful tool for learning what happens under the hood in OpenID Connect, but a debug tool it is not. However, it is a good tool for understanding how token exchange should work, so that you can more easily spot anomalies in real life.

Installing the playground

To use OpenID Connect Playground the first step is to Git clone the code:

git clone https://github.com/PacktPublishing/Keycloak-Identity-and-Access-Management-for-Modern-Applications.git

Next ensure that you npm installed. Then run

npm install
npm start

The application should now be running at http://localhost:8000.

Setting up Keycloak and Keycloak client

The OpenID Connect Playground is useless by itself, because it relies on Keycloak. For testing purposes we tend to use the Vagrant + Virtualbox environment in puppet-module-keycloak. However, running Keycloak locally or inside a container will work equally well.

Once you have Keycloak up and running, you need to create a Keycloak client to a realm (e.g. master) with the following settings:

Using the playground

With the client in place you should be able to use the application. In the Discovery section set the Issuer to your Keycloak realm's URL, for example http://192.168.56.80:8080/auth/realms/master. Then click "Load OpenID Provider Configuration" and you should see a bunch of JSON data that Keycloak published about itself to the application.

From this point on you should be able to play with the Authorization Code flow:

Further reading

If you thought this article was useful, you may be interested in our other Keycloak blog posts as well.

Introduction

Keycloak Authorization Services are built on top of well-known standards such as the OAuth2 and User-Managed Access specifications (UMA). You can manage Keycloak Authorization Services programmatically, if needed, as described in our other blog post.

OAuth2 clients (such as front end applications) can obtain access tokens from the server using the token endpoint and use these tokens to access resources protected by a resource server (such as back end services).

In the same way, Keycloak Authorization Services provide extensions to OAuth2 to allow access tokens to be issued based on the processing of all policies associated with the resource(s) or scope(s) being requested. This means that resource servers can enforce access to their protected resources based on the permissions granted by the server and held by an access token. In Keycloak Authorization Services the access token with permissions is called a Requesting Party Token or RPT for short.

Before reading into this article you may want to familiarize yourself with Keycloak concepts and terms as well as OAuth 2.0 terminology reference. Note that Keycloak uses the term "scope" a lot with different meanings and only one matches the OAuth 2.0's definition of a scope.

Use cases

Keycloak Authorization services has a wide range of use-cases:

Terms and definitions

Resource server

The server hosting the protected resources and capable of accepting and responding to protected resource requests:

Resource

Resource is the object being protected. It can be an asset or a group of assets such as:

In Keycloak a resource is a small set of information that is common to different types of resources:

Scope

Scope (also known as authorization scope) defines what can be done with a resource:

Examples of scope: view , edit, delete

Policy

Policies define the conditions that must be satisfied to grant access to a resources. They allow you to implement a strategy for access using *BAC. Keycloak has a rich set of policies to mix and match for granting access.

Permission

Permission associates the resource being protected with the policies that are evaluated for access. Permission is an association, not an authorization control.

Type (Keycloak-only)

Types in Keycloak group together same kind of resource instances. An example of a type could be urn:myclient:resources:organizations. The type that Keycloak automatically creates for the default resource is urn:resource-server-name:resources:default.

Keycloak authorization services terminology visualized

Admin Console: resource

Keycloak authorization services resource in the Keycloak Admin Console
A resource in the Keycloak Admin Console

Admin Console: scope

Keycloak authorization services scope in the Keycloak Admin Console
A scope in the Keycloak Admin Console

Admin Console: permission

Keycloak authorization services permission in the Keycloak Admin Console
A permission in the Keycloak Admin Console

Admin Console: policy

Keycloak authorization services policy in the Keycloak Admin Console
A policy in the Keycloak Admin Console

Further reading

Keycloak authorization services

UMA

OAuth2

Let's Encrypt on AWS can be a can of worms if you don't know what you're doing. Photo credit: https://commons.wikimedia.org/wiki/File:Canofworms1.png
Let's Encrypt on AWS can be a can of worms if you don't know what you're doing. Photo credit.

Introduction

Let's Encrypt and the ACME protocol enable getting free (as in beer) TLS/SSL certificates. The main use for Let's Encrypt is to enable HTTPS on web servers. Let's Encrypt TLS certificates expire pretty quickly, in 3 months, but the idea is that renewal is automated so that you don't ever really have to worry about expiration, unlike with commercial TLS certificates, which you typically need to manually renew every 12 months. Using Let's Encrypt on AWS is in some cases a very viable option, but not so much in others. In many cases trying to use Let's Encrypt certificates leads you down the path of Lovecraftian horrors and madness. Because "it all depends" I'll try to explain when using Let's Encrypt is viable in the AWS context and when you really should just give up an use certificates generated by the Amazon Certificate Manager ("ACM") instead.

Cloud resources versus virtual machines

Cloud resources such as Cloudfront distributions, Elastic Load Balancers and API Gateways are by their nature very inflexible, because they're essentially standardized commodities. Because of this it is often difficult or impossible to bend them to your will. For this reason is is a good general practice to decouple certificate renewals and the certificate consumers. In fact, this is far more important than the choice between Let's Encrypt certificates and ACM certificates.

Things that you have full control over, most notably virtual machines, are different beasts: you're not artificially limited in what you can do. For this reason the easiest option is typically to let the virtual machine handle its own certificate renewal with certbot.

Basics of Amazon Certificate Manager

Before I start talking about Let's Encrypt on AWS it have to talk a bit about Amazon Certificate Manager, or ACM for short. ACM is a service for provisioning and managing certificates. All AWS Cloud resources that use TLS/SSL must use certificates that are present in ACM - who or what created the certificates in the first place does not matter.

There are two ways to get a certificate into ACM:

  1. Import an existing certificate. The certificate can be a commercial certificate, a Let's Encrypt certificate or something else.
  2. Request a certificate from within ACM. Just like certbot, ACM creates a challenge for you and if it passes, you get a certiticate in return.

ACM supports two challenge types: email and DNS. However, in practice you should use DNS validation only. DNS validation makes you set up a special CNAME record for each domain name you want to have in the certificate.

Note that ACM certificates can only be used with a subset of AWS services. This limitation seems to affect any certificates in ACM, whether they are "Amazon-issued" (=ACM-generated) or imported.

Let's Encrypt challenge types: individual domain names vs. wildcards

Let's Encrypt supports multiple challenge types. The most common challenge is probably the HTTP-01 challenge, where certbot (or another ACME client) sets up special challenge file on a web server in well-known place which ACME servers can then verify. If the certificate verification succeed, certbot gets a signed certificate in return. The certificate fields include a domain name and one or more subject alternative names (SANs). All of the SANs are verified separately. For example, you can request a certificate that is valid for foo.example.com and bar.acme.inc at the same time, but the challenge file has to be reachable from both locations or challenge verification will fail.

Another common challenge type is the DNS-01 challenge. In this case certbot gets a certificate by adding a challenge entry in DNS. ACME servers then verify the DNS entry and give certbot a signed certificate in return. Unlike HTTP-01 challenge you can get a wildcard certificate, e.g. *.example.org, using the DNS-01 challenge. The main benefit of the DNS-01 challenge in the Cloud context is that certificate renewal is decoupled from the Cloud resources that use that certificate.

Getting certificates for AWS EC2 instances

Let's Encrypt is likely to be the best option for AWS EC2 instance or other virtual machines. Setting up certbot with the HTTP-01 challenge type is probably the most convenient approach for web servers. For other server types such as mail servers you should consider getting a wildcard certificate with the DNS-01 challenge instead. The reason is simple: ACME servers can't talk to servers that are supposed to be accessible from the VPN

The DNS-01 challenge is also convenient if you want certificates for a domain that is internally facing, but is still registered in the public DNS. For example if you have a VPC with an associated private DNS zone (e.g. *.example.internal) you can get a wildcard certificate easily with certbot and one of the certbot DNS plugins.

The only caveat with certificates generated with the DNS-01 challenge is that you need to somehow handle distributing renewed certificates to the servers that need them. Tools such as Puppet or Ansible can help a lot here.

Let's Encrypt on AWS: challenges with Cloud resources

Cloudfront distributions: certbot and the HTTP-01 challenge

Amazon Cloudfront speeds up distribution of content by serving it from edge locations that are geographically close to the client that is requesting them. Without Cloudfront content would be served directly from the origin which might be on the other side of the globe, depending on the client.

Here I assume that you use Amazon S3 as the origin for the Cloudfront distribution. For that particular use-case there is a certbot plugin called certbot-s3front that is designed for managing HTTP-01 challenges with Cloudfront and S3. There are two limitations, though:

  1. You need to generate the initial Let's Encrypt certificate on a real webserver first
  2. You cannot add any Subject Alternative Names once your certificate is live on Cloudfront

Both of the above limitations are a direct consequence of Cloudfront's features / limitations:

In other words you have a huge chicken-and-egg problem on your hands if you ever need to add SANs to your certificate. If you can live with static list of SANs you're probably going to be ok certbot-s3front.

Elastic Load Balancers: certbot and the HTTP-01 challenge

Elastic Load Balancers can be made to work with certbot and the HTTP-01 challenge type. However, there's no certbot plugin you could use for this job. Without going into excessive detail what you need to do is:

Overall, this is definitely not a job for the faint of heart. That said, if you really really really want to go this route, please contact us and we may be able to publish the Lambda function we were using for this purpose.

API Gateways: certbot and the HTTP-01 challenge

Short answer here is: no, this s not possible. The long answer is that it is theoretically possible, but is an exercise in madness, and here's why:

  1. AWS API Gateway only supports HTTPS
    • There is nothing listening on port 80 (HTTP)
  2. Letsencrypt HTTP-01 challenge always uses HTTP first
    • This will fail when the domain name is pointing to an API Gateway, because there is nothing listening on port 80
  3. Letencrypt HTTP-01 challenge will only switch to HTTPS when it encounters a redirect from HTTP to HTTPS
    • AWS API Gateway does not provide such a redirect
  4. There is no way to configure the challenge request to go through HTTPS at the client (certbot) side, because the challenge request originates from the ACME servers

There seems to be only one realistic option for solving this problem: have a Cloudfront distribution do the HTTP->HTTPS direct. This is described in this blog post, but I've never tried this myself.

What about certificate rotation, then?

When you configure a Cloud resource (e.g. a load balancer) use a certificate you point it the certificate's ARN (=unique identifier). What this means is that when you create or import a new certificate it will have a different ARN from the old one. So, you need to make your Cloud resource use the new certificate instead of the old one: this is called certificate rotation. If you use certbot - whether with HTTP-01 or DNS-01 challenge type, you need to handle certificate rotation on renewal yourself using a custom-made script.

The case for Amazon Certificate Manager certificates

There are several major benefits of letting ACM generate certificates for you:

  1. Certificates are valid for 12 months
  2. Certificate renewal is handled automatically by AWS
  3. Adding Subject Alternative names is supported and only requires passing the challenge for new SANs, not for any of the old ones
  4. Certificate renewal does not change the certificate arn. In other words, you don't need to worry about rotating the certificate in all the resources that use it
  5. It supports wildcards (e.g. *.example.org) as well as multiple subject alternative names (SANs) in the certificate

All of this makes ACM a superior choice for AWS Cloud resources.

Summary: what am I supposed to do then?

The conclusion we can draw from all this is as follows:

If you do not follow this advise you may be ok, but "it all depends" on the nitty gritty details.

Fixing wrong Route53 contact details is easy (once you know how). Photo credit: https://www.pexels.com/photo/adventure-asphalt-clouds-country-416974/
Fixing wrong Route53 contact details is easy (once you know how). Photo credit.

The symptoms

I recently received and email from Amazon Route53 asking me to verify the contact details for one of our registered DNS domains (replaced here with example.org). And sure enough, I saw wrong Route53 domain contact details in the email:

Subject: Confirm that contact information for example.org is correct.
Message body:

Dear AWS customer,

This message is a reminder to help you keep the contact information for your domain registration up to date. WHOIS records include the following information:

  Domains:
  example.org

  Registrant:
  Name: On behalf of example.org owner
  Organization Name: Identity Protection Service
  Address: PO Box 786
  City: Hayes
  State/Province: Middlesex
  Country: GB
  Postal Code: UB3 9TR

  Administrative Contact:
  Name: On behalf of example.org owner
  Organization Name: Identity Protection Service
  Address: PO Box 786
  City: Hayes
  State/Province: Middlesex
  Country: GB
  Postal Code: UB3 9TR
  Phone: +44.1483307527
  Fax: +44.1483304031
  Email: [email protected]

  Technical Contact:
  Name: On behalf of example.org owner
  Organization Name: Identity Protection Service
  Address: PO Box 786
  City: Hayes
  State/Province: Middlesex
  Country: GB
  Postal Code: UB3 9TR
  Phone: +44.1483307527
  Fax: +44.1483304031
  Email: [email protected]

The contact information provided above is what is shown in the public WHOIS. If your domain is privacy protected, the information will be different than the contact data you have submitted. To verify your contact information, please open the Amazon Route 53 console to view your current contact information and make any necessary corrections:
https://console.aws.amazon.com/route53/home#DomainDetail:example.org

If your information is accurate, you do not need to take any action.

Important
Under the terms of your registration agreement, providing false contact data can be grounds for the cancellation of your domain name registration.

Regards,
Amazon Route 53 

These wrong Route53 domain contact details originate from whois data for the registered domain. One might reasonably expect the whois data to match the contact details for AWS Route53 domain. However, in this case the Route 53 contact details were correct, but whois data seemed to be out of sync with those.

What is the underlying cause?

In retrospect the problem was a simple: AWS Route 53 Privacy Protection was enabled for this particular domain. That may have been immediately obvious if it were not for two things

Fortunately I was able to resolve this issue with help from AWS support in a few days. As usual, I needed to get through the first level support first, but then a more knowledgeable person provided the solution.

Wrong Route53 contact details: how to fix?

Wrong Route53 domain contact details are easy to fix. To show the correct contact information for a domain in whois queries you need to turn off Privacy Protection. Alternatively just keep on using Privacy Protection and ignore the domain contact detail verification emails from AWS Route53. As bonus you should receive less spam in your inbox with Privacy Protection turned on.

I hope this information helps other poor souls who have never heard of Privacy Protection before and get confused like I did. If you enjoyed this blog post you may be interested in our other AWS blog posts as well.

Keycloak realm SMTP server settings can be be managed with Ansible, but nobody (else) figured out that proper documentation might be helpful.
Keycloak realm SMTP server settings can be be managed with Ansible, but nobody (else) figured out that proper documentation might be helpful.

Introduction

Ansible has reasonably good support for managing various aspects of Keycloak. You can use the community.general.keycloak_realm module to handle realm management, including Keycloak realm SMTP server settings. However, in the true Ansible fashion the documentation looks good, but does not help you much. In fact, documentation only mentions that the smtp_server parameter is a dictionary, but that's it. To be honest, that's about as helpful as the description of the RealmRepresentation for Keycloak. This article aims to fill this documentation void. I also want to share some general advise for similar cases that pop up when you manage Keycloak programmatically.

Getting the JSON for Keycloak realm SMTP server settings

First you need to figure out the correct JSON payload to pass to Keycloak. First create a test realm with SMTP settings. Then use kcadm.sh to show the resulting JSON object:

$ kcadm-wrapper.sh get realms/test --no-config --server http://localhost:8080 --realm master --user keycloak --password secret
--- snip ---
  "smtpServer" : {
    "password" : "**********",
    "starttls" : "true",
    "port" : "587",
    "auth" : "true",
    "host" : "smtp.example.org",
    "replyTo" : "[email protected]",
    "from" : "[email protected]",
    "fromDisplayName" : "Keycloak",
    "user" : "my-smtp-user"
  },
--- snip ---

Managing Keycloak realm SMTP settings with Ansible

Now we have the correct JSON format for configuring Keycloak realm SMTP servers. We then pass the equivalent values to the keycloak_realm module:

- name: "Ensure realm test"
  community.general.keycloak_realm:
    auth_client_id: "admin-cli"
    auth_keycloak_url: "http://localhost:8080/auth"
    auth_realm: "master"
    auth_username: "keycloak"
    auth_password: "secret"
    enabled: true
    state: "present"
    realm: "test"
    id: "test"
    display_name: "Test realm"
    smtp_server:
      host: "smtp.example.org"
      port: "587"
      starttls: "true"
      auth: "true"
      user: "my-smtp-user"
      password: "secret"
      from: "[email protected]"
      fromDisplayName: "Keycloak"
      replyTo: "[email protected]"

Notice how the parameters in the smtp_server dictionary is exactly as in the JSON payload (e.g. fromDisplayName). This is crucial because Keycloak Admin REST API silently ignores parameters it does not recognize. It will also happily create a partial SMTP server configuration for you. The only exception is if you're lucky enough to forget an essential parameter (e.g. "host"). Also note that all parameters in smtp_server have string values - even those that are really booleans. That is a feature of the Keycloak Admin REST API (see the JSON payload, above) and not a bug.

Other use-cases

While this article is about SMTP server settings, the basic process is applicable to many other Keycloak resources. The methods described here helps you manage Keycloak with Puppet, Terraform and kcadm.sh commands as well.

Introduction

Prometheus is a widely used, cloud-native open source monitoring solution. Its alerting component is called Alertmanager, which can send alerts to email, Slack and elsewhere. Both Prometheus and Alertmanager fit very well into the infrastructure as code model as well. Twilio SMS is a SMS sending service with consumption-based pricing model with an extensive API. With some effort you can integrate Twilio SMS and Prometheus together as described in this article.

Our particular challenge was that we did not notice emails or instant messages quickly enough: we work in pomodoros and keep tight focus on development work. As a result we can't react to every beep, 99% of which are not urgent. In addition most of the day we keep our mobiles silent and disconnected from Internet - except for SMS messages, (important) phone calls and the "between the 25 minute pomodoros" times.

This is where Twilio SMS and Prometheus Alertmanager integration comes in: it allows us to get critical alerts as SMS without having to pay attention to the constant flood of low priority email and IM messages.

Promtotwilio: the glue between Twilio SMS and Prometheus

We were lucky in that a person (Gael Gillard from Belgium, to be exact) had already written a simple Go web application called promtotwilio that forwards Alertmanager alerts to Twilio SMS. Promtotwilio acts as an Alertmanager webhook and handles parsing the alert, converting it to simplified SMS format and sending SMS messages via Twilio SMS API.

Twilio SMS had a few limitations that we needed to fix (custom listen port, multiple receivers), but otherwise it was a solid piece of work. Promtotwilio takes all its parameters via environment variables. We saved these into an environment variable file for systemd (/etc/promtotwilio.conf) with appropriately locked down permissions:

SID="8a31e06d74cd8bb08e7fc7b71ad918832f"
TOKEN="6f87636daece98c2f9a953d5bda3df2f"
SENDER="+12345678901"
RECEIVER="+358409898981,+358406060601"
PORT=9191

You can obtain the SID and TOKEN from Twilio. The SENDER must match your "Twilio phone number", which is basically a virtual phone number the SMS messages appear to come from.

NOTE 1: If your Twilio SMS account is in trial mode you need to separately add and verify each phone number listed in RECEIVERS.

NOTE 2: customizing the PORT and using multiple RECEIVERS requires patches to promtotwilio (PRs here and here). We use our own podman-builder to create patched promtotwilio builds.

As we don't run promtotwilio as a container we created a simple systemd service unit ("/etc/systemd/system/promtotwilio.service") for it:

[Unit]
Description=Send Prometheus alerts as SMS via Twilio SMS API
After=network.target

[Service]
Type=simple
User=root
EnvironmentFile=/etc/promtotwilio.conf
ExecStart=/usr/local/bin/promtotwilio

[Install]
WantedBy=multi-user.target

To enable and start the unit we did:

systemctl daemon-reload
systemctl enable promtotwilio
systemctl start promtotwilio

If your environment file does not have any errors then promtotwilio should now be running:

[email protected]:~# systemctl status promtotwilio
● promtotwilio.service - Send Prometheus alerts as SMS via Twilio SMS API
     Loaded: loaded (/etc/systemd/system/promtotwilio.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-02-16 10:11:37 UTC; 2h 49min ago
   Main PID: 2227451 (promtotwilio)
      Tasks: 6 (limit: 4556)
     Memory: 3.7M
     CGroup: /system.slice/promtotwilio.service
             └─2227451 /usr/local/bin/promtotwilio

Feb 16 10:11:37 prometheus.puppeteers.in systemd[1]: Started Send Prometheus alerts as SMS via Twilio SMS API.

Configuring Alertmanager routing

Using Twilio SMS and Prometheus together can become fairly costly soon if you have lots of alert noise (one SMS is ~$0,08). Therefore you should send only critical alerts as SMS. You can accomplish this by setting appropriate alert labels in /etc/alertmanager/alert.rules:

- name: robot.rules
  rules:
  - alert: RobotTestFailure
    expr: robot_failed_total > 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Robot - "{{ $labels.test_app }}" has failures
      description: 'Test "{{ $labels.test_name }}", ID: "{{ $labels.test_id }}"'

Here we set label "severity" to value "critical" in case of this particular Robot Framework-based test.

Once you have labeled your alert rules in Prometheus you can route alerts based on those labels:

route:
  group_by:
  - alertname
  - cluster
  - service
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: email
  routes:
  - match_re:
      severity: ".*"
    receiver: email
    continue: true
  - match:
      severity: critical
    receiver: sms
    continue: true
receivers:
- name: email
  email_configs:
  - to: [email protected]
    from: [email protected]
    smarthost: mail.example.org:25
    auth_username: [email protected]
    auth_identity: [email protected]
    auth_password: supersecret
- name: sms
  webhook_configs:
  - url: http://127.0.0.1:9191/send

The receivers section defines the potential alerting targets. In this case we have two, email and sms. The webhook URL for sms should point to a running promtotwilio instance.

The routes section defines the rules by which those receivers are selected. With the above config alerts you route alerts of any severity to email. The continue: true ensures that Alertmanager does not stop after sending the email, and instead goes to the next match. Then, if severity == critical it will additionally send an email.

To make these changes stick you need to restart prometheus and alertmanager services:

systemctl restart prometheus alertmanager

If all goes well you should be able to receive alerts as SMS now.

Resources for Twilio SMS and Prometheus integration

Testing alertmanager webhooks with curl is not easy, not difficult. But doing so can save you a lot of time. Photo credit.

Introduction

Prometheus is a Cloud-native metrics platforms that is very easy to manage with infrastructure as code tools. Prometheus is often couple with Alertmanager which handles alerting and alert routing. AlertManager has good support for various alert transport (e.g. email or slack) but its alerting capabilities can be extended with custom webhooks. When AlertManager is configured to use a webhook in an alert route, it forwards alerts to an HTTPS/HTTP endpoint instead of handling transporting of the alert itself. AlertManager's official documentation gives a rough idea of what kind of payload to use and how to configure the webhook in alertmanager.yaml. What is not documented is how you can send test alerts to the webhook without having to trigger real (or fake) alerts in Prometheus. While using fake alerts does work, it is a slow, tedious and invasive process. This article shows one way for testing AlertManager webhooks with curl. In our case the target was to speed up testing of changes to promtotwilio.

Alert payload format

The JSON-formatted alert payload sent by AlertMangaer is documented in the official webhook documentation. However, the sample payload is not usable as-is, so here's a JSON payload with real data:

{
  "receiver": "sms",
  "status": "firing",
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "ProbeFailure",
        "instance": "https://server.example.org",
        "job": "http_checks",
        "monitor": "master",
        "severity": "critical"
      },
      "annotations": {
        "description": "Instance https://server.example.org has been down for over 5m. Job: http_checks",
        "summary": "BlackBox Probe Failure: https://server.example.org"
      },
      "startsAt": "2023-02-06T13:08:45.828Z",
      "endsAt": "0001-01-01T00:00:00Z",
      "generatorURL": "http://prometheus.example.org:9090/graph?g0.expr=probe_success+%3D%3D+0\\u0026g0.tab=1",
      "fingerprint": "1a30ba71cca2921f"
    }
  ],
  "groupLabels": {
    "alertname": "ProbeFailure"
  },
  "commonLabels": {
    "alertname": "ProbeFailure",
    "instance": "https://server.example.org",
    "job": "http_checks",
    "monitor": "master",
    "severity": "critical"
  },
  "commonAnnotations": {
    "description": "Instance https://server.example.org has been down for over 5m. Job: http_checks",
    "summary": "BlackBox Probe Failure: https://server.example.org"
  },
  "externalURL": "http://prometheus.example.org:9093",
  "version": "4",
  "groupKey": "{}/{severity=\"critical\"}:{alertname=\"ProbeFailure\"}",
  "truncatedAlerts": 0
}

Save this payload to a file or create your own (see below).

Getting a real alert JSON payload

If your webhook wants to do some fancy stuff the sample data from above may not be enough. Should that be the case, you can trigger a real alert payload from your own environment using tcpdump. Here the webhook service is running on loopback interface on port 9191 on the same host as AlertManager:

tcpdump -A -i lo port 9191

Once tcpdump is running, just trigger a (fake) alert from AlertManager and wait for the JSON payload to appear in ASCII format. You also get the HTTP headers as a bonus, though they should be very similar to the ones shown below. For clarity you may want to pass the JSON payload through a JSON pretty-printing function to make it more readable - it is all in one line by default. Save your payload into JSON file.

This strategy won't work if traffic between AlertManager and the webhook service is encrypted. If that is the case, you can always make the webhook print out the real AlertManager payloads it receives and get the payload that way.

HTTP header format

Alertmanager sets multiple HTTP headers when connecting to the webhook:

Here "Host" refers to the IP and port of the webhook service.

Testing AlertManager webhooks with curl

Once you have the payload and headers you can send a message to the webhook service with curl:

curl \
-i \
-H "Host: 127.0.0.1:9191" \
-H "User-Agent: Alertmanager/0.23.0" \
-H "Content-Type: application/json" \
--request POST \
--data @sample.json \
http://127.0.0.1:9191/send

The path in the webhook service URL is arbitrary. In case of promtotwilio the POST requests must go to /send. Your webhook may behave differently.

The --data option reads a file when the option's value is prefixed with a "@". In this case curl reads a file called sample.json and passes it as data.

Kyllästyimme Edenred-korttien latauksen kankeuteen ja automatisoimme latausprosessin. Tässä artikkelissa kerrotaan, miten automaatio toteutettiin CSV-tiedostoilla ja ohjelmistorobotilla. Lisäksi artikkelista löytyy linkit GitHub-sivustollemme, josta voit ladata koodit käyttöösi.

Edenred-korttien lataustapoja

Edenred on yksi suurimmista suomalaisista työsuhde-etujen tarjoajista. Edenred tarjoaa fyysisiä ja virtuaalisia kortteja, joihin voidaan ladata työntekijöiden lounas-, virike- ja työmatkaetuja. Edenred-korttien lataamiseen on lukuisia tapoja, mutta kaikki ne vaativat merkittävästi käsityötä työntekijän ja/tai työnantajan puolelta. Vaikka työntekijät voivatkin automatisoida omat saldolatauspyyntönsä, pitää työnantajan silti muistaa hyväksyä ne. Tässä artikkelissa esitellään tapa automatisoida Edenred lataus kokonaan CSV-tiedostolla ja Robot Frameworkia käyttävällä ohjelmistorobotilla. Ainoastaan Edenredin lähettämien saldotilauslaskujen maksu pitää tehdä käsin, joskin tämäkin olisi haluttaessa automatisoitavissa.

Edenredin logo

CSV:n-tiedostojen käyttö on hyödyllistä niille työnantajille, jotka haluavat tarjota Edenred-edut työntekijöilleen ilman ylimääräistä saldotilauksiin liittyvää byrokratiaa ja inhimillisten virheiden mahdollisuutta. Erityisen hyvin tämä malli toimii silloin, kun työntekijöiden korteille ladattavat summat pysyvät pääosin samoina kuukaudesta toiseen.

Edenred-latauksessa käytetty CSV-tiedosto

Työnantaja voi ladata työntekijöiden Edenred-korteille saldoa CSV-tiedoston avulla. Alla Edenredin tarjomana malli-CSV-tiedoston dokumentaatio:

# VERSIO: 1.0.0.7,,,,,,,,,,,,,,,,,
# ,,,,,,,,,,,,,,,,,
#TYHJÄT RIVIT JA RIVIT JOTKA ALKAVAT #-MERKILLÄ JÄTETÄÄN HUOMIOIMATTA,,,,,,,,,,,,,,,,,
#TÄHDELLÄ * MERKITYT KENTÄT OVAT PAKOLLISIA,,,,,,,,,,,,,,,,,
#TÄHDELLÄ ** MERKITYÍSTÄ KENTISTÄ TOINEN TULEE TÄYTTÄÄ,,,,,,,,,,,,,,,,,
# ,,,,,,,,,,,,,,,,,
# TOIMINTO KOODIT:,,,,,,,,,,,,,,,,,
#    N = TILAA UUSI KORTTI,,,,,,,,,,,,,,,,,
#    R = TILAA KORTTI UUDELLEEN,,,,,,,,,,,,,,,,,
#    U = PÄIVITÄ TYÖNTEKIJÄN TIEDOT (PL. HENKILÖTUNNUS JA ASIAKASNUMERO),,,,,,,,,,,,,,,,,
#    D = POISTA KORTINHALTIJA,,,,,,,,,,,,,,,,,
#    H = MUUTA KORTINHALTIJA POISSA TILAPÄISESTI TILAAN,,,,,,,,,,,,,,,,,
#    L = LATAA SALDOA,,,,,,,,,,,,,,,,,
# ,,,,,,,,,,,,,,,,,
# TYÖNTEKIJÄN TYÖSUHTEEN MUOTO:,,,,,,,,,,,,,,,,,
#    F = KOKOAIKAINEN,,,,,,,,,,,,,,,,,
#    P = OSA-AIKAINEN,,,,,,,,,,,,,,,,,
#    T = MÄÄRÄAIKAINEN,,,,,,,,,,,,,,,,,
#    I = PASSIIVINEN,,,,,,,,,,,,,,,,,
# ,,,,,,,,,,,,,,,,,
#TOIMINTOKOODI*,HENKILÖTUNNUS*,ETUNIMI*,SUKUNIMI*,OSOITE*,POSTINUMERO*,KAUPUNKI*,PUHELINNUMERO**,SÄHKÖPOSTIOSOITE**,TYÖNTEKIJÄNUMERO,KUSTANNUSPAIKKA,OSASTO,KERROS,ALUE,TYÖSUHTEEN MUOTO,LOUNAS_LATAUS,VIRIKE_LATAUS,TRANSPORT_LATAUS

Kuten yltä näkyy, CSV-tiedostoilla voi tarvittaessa tehdä paljon muutakin kuin ladata saldoa.

Saldon lataamisessa kaksi ensimmäistä kenttää ovat välttämättömiä:

Kolme viimeistä kenttää määrittävät ladattavan saldon määrän saldon tyypin mukaisesti:

Muut CSV:n kentät ovat todennäköisesti saldolatauksessa tarpeettomia.

Huom: CSV-tiedosto käyttää vanhentunutta ISO-8859-1 merkistökoodausta, mikä on syytä ottaa huomioon sitä avattaessa. Windowsissa esimerkiksi Notepad++-tekstieditori pystyy käsittelemään ISO-8859-1 merkistökoodausta käyttäviä tekstitiedostoja oikein. Mikäli tekstieditori ei osaa käsitellä kyseistä merkistökoodausta, on riskinä se, että CSV-tiedostoon päätyy Unicode-merkkejä ja saldon lataaminen voi epäonnistua. Tyypillisesti näin käy skandinaavisten kirjaimien ("ä", "ö" ja "å") kanssa, koska niiden koodit ovat eri UTF-8- ja ISO-8859-1-merkistökoodauksissa.

Eri edut ladataan eri CSV-tiedostoilla

Edenredin tilaustyökalussa (ticket.edenred.fi) valitaan heti alussa tuote, jonka parissa operoidaan. Tuote voi olla joko "Edenred-kortti" (lounas- ja virike-etu) tai "Edenred Työmatka-kortti" (työmatkaetu):

ticket.edenred.fi:n tuotevalikko
ticket.edenred.fi:n tuotevalikko

Kun tuote on valittu, kaikki toiminnot ticket.edenred.fi-palvelussa kohdistuvat kyseiseen korttityyppiin. Tämä tarkoittaa CSV-latauksen tapauksessa sitä, että lounas ja virike-edut ladataan yhdellä CSV-tiedostolla ja työmatkaedut toisella.

Lounas- ja virike-edun määrittäminen

Lounas- ja virike-etua ladataan seuraavan kaltaisella rivillä:

L,301280-012X,Matti,Mallikas,Esimerkki,20100,Turku,0401231234,[email protected],,,,,,F,150,33,

Yllä Matti Mallikas saa lounasetua 150€ ja virike-etua 33€. Työmatkaetua vastaava viimeinen kenttä jätetään tyhjäksi. Vastaava rivi pitää lisätä jokaiselle työntekijälle, jolle ladataan jompaa kumpaa etua.

Työmatkaedun määrittäminen

Työmatkaetua ladataan seuraavasti:

L,301280-012X,Matti,Mallikas,Esimerkki,20100,Turku,0401231234,[email protected],,,,,,F,,,55

Yllä Matti Mallikas saa työmatkaetua 55€. Lounas- ja virike-edut määrittävät kentät eli kolmanneksi ja toiseksi viimeinen jätetään tyhjiksi. Kullekin työmatkaetua nauttivalle työntekijälle luodaan CSV-tiedostoon oma rivi.

Edenred-lataus CSV-tiedostolla

Ennen Edenred-latauksen automaatiota käyn lyhyesti läpi käsin tehtävän latausprosessin. Heti ticket.edenred.fi-palveluun kirjauduttuasi valitse oikean tuote (ks. yllä) eli "Edenred-kortti" (lounas- ja virike-etu) tai "Edenred-työmatkakortti" (työmatkaetu). Sen jälkeen napsauta "Työntekijöiden hallinta" ja sitten "Päivitys CSV-tiedostolla":

ticket.edenred.fi: päivitys csv-tiedostolla
ticket.edenred.fi: päivitys csv-tiedostolla

Sivun alaosasta voi valita ladata CSV-tiedoston, joka ladataan. Muista ladata oikean tyyppinen CSV-tiedosto (lounas- ja virike tai työmatka) ja napsauta "Tarkasta".

CSV-tiedoston valinta
CSV-tiedoston valinta

Mikäli CSV-tiedostossa ei ollut ilmiselviä virheitä, pääset valitsemaan etujen latauspäivän ja sen jälkeen vahvistamaan tilauksesi. Älä valitse "haluan laskun heti" mikäli haluat Edenrediltä e-laskun.

Mikäli CSV-tiedoston lataus onnistui, pitäisi CSV-lataussivulla olevassa "Tiedostohistoria"-taulukossa näkyä tekemäsi tilaus tilassa "onnistunut".

Edenred-kortin lataus Robot Frameworkilla

Käsin tehtävä Edenredin CSV-lataus on melko työläs ja mikä pahinta, se pitää muista tehdä noin 18. päivä joka kuukausi, jotta työsuhde-edut ehtivät latautua työntekijöille kuun vaihteen tienoilla. Koska olemme automaatioasiantuntijoita, päätimme automatisoida koko prosessin, jolloin pystymme välttymään ihmisen tekemiltä virheiltä kokonaan.

Edenred ei tarjoa mitään ohjelmointirajapintaa (API), joten päätimme käyttää automaatiotyökaluna avoimen lähdekoodin Robot Frameworkia. Se soveltuu erinomaisesti webbikäyttöliittymien automaatioon ja laajemmin ohjelmistorobotiikkaa vaativiin tehtäviin. Edenredin saldolatauksen automaatiossa Robot tekee ticket.edenred.fi-palvelussa täsmälleen samat toiminnot kuin ihminenkin tekisi, mutta vain paljon nopeammin ja luotettavammin. Robot Framework voidaan asentaa lähes mihin tahansa työpöytäkäyttöjärjestelmään (Linux, Windows, MacOS). Itse käytämme latausrobottimme käyttöjärjestelmänä Linuxia, koska sen päälle on helpoin rakentaa automaatiota.

Edenred-latausrobottimme on käytännössä Linux-virtuaalikone, johon on asennettu graafinen käyttöliittymä, VNC etäkäyttöä varten, verkkoselain sekä Robot Framework tarvittavine lisäosineen (ks. puppet-robot). Edenred-lataus on ajastettu kuun 18. päivään, jolloin Robot ajaa Edenredin latausskriptin (ks. robot-edenred) eri CSV-tiedostoin ja parametrein lounas- ja virike-edulle sekä työmatkaedulle. Mikäli prosessi epäonnistui, saamme hälytyksen Prometheus-valvontajärjestelmän Alertmanagerilta. Robotin ja Prometheuksen välisenä "tulkkina" toimii tarkoitusta varten luomamme skripti (ks. robot-collector), joka muuntaa Robotin raportin (output.xml) oleelliset osat Prometheus Node Exporter Textfile Collectorin kanssa yhteensopivaan tiedostomuotoon. Latausrobottimme myös varmistaa jokaisen saldotilauksen jälkeen, että se on tilassa "onnistunut". Mikäli ongelmia löytyy, saamme siitäkin hälytyksen.

Mikäli Edenredin saldolatausten automaatio kiinnostaa, mutta et halua itse ruveta sitä rakentamaan, ota meihin yhteyttä. Varmimmin yhteyden saa sähköpostilla tai tekstiviestillä.

Causes for the Terraform AWS UnauthorizedOperation errors

Terraform is an infrastructure as code tool you can use to configure Cloud resources in AWS. When using Terraform AWS provider you frequently run into various UnauthorizedOperation errors when creating, modifying or deleting resources. That happens unless you do what you should not do and let Terraform use an AWS API key tied to an IAM user with full admin privileges. In almost all cases Terraform UnauthorizedOperation errors are caused by missing IAM permissions. It can sometimes be a bit tedious to get to the root cause, because Terraform may not show it in its normal output.

A happy young couple who have just resolved all their Terraform AWS provider UnauthorizedOperation errors. Photo credit: https://www.pexels.com/it-it/foto/foto-di-un-uomo-e-di-una-donna-che-guardano-il-cielo-732894
A happy young couple who have just resolved all their Terraform AWS provider UnauthorizedOperation errors. Photo credit

Visible errors

You can resolve most access denied easily because they're not trying to hide. Typically Terraform prints out a message like this:

│ Error: reading inline policies for IAM role foobar, error: AccessDenied:
| User: arn:aws:iam::012345678901:user/terraform is not authorized to
| perform: iam:ListRolePolicies on resource: role foobar because no
| identity-based policy allows the iam:ListRolePolicies action
│       status code: 403, request id: 3bf72b74-74e5-4f54-883b-cac962f6e21a

In this case it is enough to grant Terraform the iam:ListRolePolicies permission.

Hidden errors

Sometimes you get an UnauthorizedOperation error with no hint as to what permission is missing:

│ Error: reading EC2 Instance (i-0123456789abcdef0): 
| UnauthorizedOperation: You are not authorized to perform this operation.
│       status code: 403, request id: 307459e3-156d-4d1b-b8a8-7a30b52acb93

To figure out what IAM permission is missing you need to run Terraform in debug mode:

$ TF_LOG=debug terraform apply

2023-01-11T14:12:28.597+0200 [DEBUG] provider.terraform-provider-aws_v4.21.0_x5: [aws-sdk-go] DEBUG: Response ec2/DescribeInstances Details:
---[ RESPONSE ]--------------------------------------
HTTP/1.1 403 Forbidden
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Content-Type: text/xml;charset=UTF-8
Date: Wed, 11 Jan 2023 12:12:27 GMT                                                                                            
Server: AmazonEC2                                   
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: accept-encoding                                                                                                          
X-Amzn-Requestid: 41f54e26-be21-4d8e-9918-58461e8f5642
                                                               
                                                                                                                               
-----------------------------------------------------: timestamp=2023-01-11T14:12:28.596+0200
2023-01-11T14:12:28.597+0200 [DEBUG] provider.terraform-provider-aws_v4.21.0_x5: [aws-sdk-go] <?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>UnauthorizedOperation</Code><Message>You are not authorized to perform this operation.</Message></Error></Errors><RequestID>41f54e26-be21-4d8e-9918-58461e8f5642</RequestID></Response>: timestamp=2023-01-11T14:12:28.596+0200
2023-01-11T14:12:28.597+0200 [DEBUG] provider.terraform-provider-aws_v4.21.0_x5: [aws-sdk-go] DEBUG: Validate Response ec2/DescribeInstances failed, attempt 0/25, error UnauthorizedOperation: You are not authorized to perform this operation.
        status code: 403, request id: 41f54e26-be21-4d8e-9918-58461e8f5642: timestamp=2023-01-11T14:12:28.596+0200

From the output it is clear that the missing IAM permission is ec2:DescribeInstances. Because TF_LOG=debug produces tons of output you may want to use the -target parameter to reduce the scope of your terraform apply (or plan).

Encoded errors

Sometimes UnauthorizedOperation errors come in encoded form. Why, I can only fathom. In any case the output looks like this:

│ Error: attaching EBS Volume (vol-abcdef0123456789a) to EC2 Instance
| (i-0123456789abcdef0): UnauthorizedOperation: You are not authorized to
| perform this operation. Encoded authorization failure message:
| <really-long-random-looking-string>

│       status code: 403, request id: 9fe9c991-21ed-4b27-9277-771e3fc1318c  

This type of error messages can decoded with AWS CLI:


$ aws sts decode-authorization-message --encoded-message '<really-long-random-looking-string>' --query DecodedMessage --output text
--- snip ---
"action": "ec2:AttachVolume"
--- snip ---

In this case the problem is a missing ec2:AttachVolume IAM permission.

Summary of Terraform AWS UnauthorizedOperation errors

IAM permissions are the main cause for Terraform UnauthorizedOperation errors in AWS. Terraform shows some errors immediately, but AWS hides or encodes others. You can, in almost all cases, find out the cause for it with the simple tricks shown in this article.

You can easily get lost in the jungle of AWS S3 bucket access management. Make it a lot easier and disable S3 bucket ACLs.
You can easily get lost in the jungle of AWS S3 bucket access management. Make it a lot easier and disable S3 bucket ACLs.

Introduction

AWS recommends that you disable S3 bucket ACLs for all new buckets. To understand why some background information is needed.

AWS S3 providers two ways to manage access to S3 buckets and objects:

AWS combines IAM policies and ACLs to figure out the effective access control rules for objects in an S3 bucket. The combination of legacy (ACL) and current (IAM) access management is what makes S3 access control and authorization quite complex.

AWS recommends that you disable S3 bucket ACLs to reduce access management complexity. This is definitely a sane advise. While there is plenty of documentation on that topic, none of it really explains what happens when you disable S3 bucket ACLs.

UPDATE since April 2023 bucket ACLs are disabled by default on all new S3 buckets.

Why ACLs are a bad idea?

Suppose you have an AWS S3 bucket "Foo" on AWS account "A" with ACLs enabled. You then grant permissions for AWS account "B" to upload objects to the "Foo" bucket. When "B" uploads objects there the object owner will be "B", not "A". The uploader ("B") is also able to modify the ACLs for the files it has uploaded. The bucket owner ("A") is able to delete and archive any objects in the bucket, including those uploaded by "B". This is cool, but also quite complex and prone to error. Moreover, object-level ACLs are difficult to see or review with AWS CLI or AWS Console - you could be speaking of thousands of files each with their own ACLs.

On top of object-level ACLs you have the bucket-level ACLs. You can visualize them more easily as you only have one per bucket. However, you can also replace them very easily with IAM policies.

What does "disable S3 bucket ACLs" really mean?

When you disable S3 bucket ACLs you're not deleting them. Instead, several things happen:

What makes this a bit confusing is that ACLs are still there - you can query and view them - but they have no effect anymore.

How to disable S3 bucket ACLs?

When you create new S3 bucket with the AWS Console the ACLs are disabled by default. Older buckets may have ACLs enabled. The same may is true for buckets you have created with infrastructure as code tools like Terraform.

To disable S3 bucket ACLs manually you have to do two steps:

If you are using Terraform all you need to do is create an aws_s3_bucket_ownership_controls resource and link it with the bucket:

resource "aws_s3_bucket" "test" {
  bucket = "s3test.example.org"
}

resource "aws_s3_bucket_ownership_controls" "test" {
  bucket = aws_s3_bucket.test.id

  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

It's time for a brief rant about Microsoft Teams. Before we start I'll state that Teams is probably an ok platform for collaboration when you work for just one organization. But when you need to work with multiple organizations, each with their own Teams, you immediately run into a world of hurt. There are two reasons:

In our case when we want to talk to a customer, we have to launch Teams and select that customer's workspace. Then, if we want to talk to another customer, we have to switch to another workspace. In other words, we don't have any way to keep track of all the discussions in a single Teams client. Nor can we use a third-party, multi-protocol IM application like Pidgin or Adium to consolidate most of our customer chats into one application using (more) open protocols such as Slack, IRC and XMPP. We do use the Signal client also, but at least that's open source and has a decent interface. So, the only reasonable option seems to be to suggest some other system (e.g. Signal or Slack) instead.

Now, this is all just a symptom of the instant messenger hell we're living in, Teams just being one of the worst offenders now, usability-vise. Several commercial entities (Facebook Messenger, Whatsup, Teams, etc) are trying to catch as much market share as possible and their customers - in particular power users - pay the price as crappy user experience and a gazillion of IM clients. In the end all of those commercial entities will fail in gaining market leadership, because users are already stuck with multiple different IM systems and can't migrate to any one in unison.

Converting a hash into JSON in Puppet is simple. The hash in the image has nothing to do with data structures, but was added for SEO purposes. Photo credit: https://www.pexels.com/it-it/foto/secco-foglia-illegale-bocciolo-9259842.
Converting a hash into JSON in Puppet is simple. The hash in the image has nothing to do with data structures, but was added for SEO purposes. Photo credit: https://www.pexels.com/it-it/foto/secco-foglia-illegale-bocciolo-9259842/

This article shows you how to convert a hash into JSON in Puppet using a simple ERB template that gets its data from Hiera.

Suppose you have this data in Hiera:

myhash:
  mykey:
    foo: one
    bar: two

Converting a hash into a JSON file on the target node is surprisingly easy. First look up the data:

$myhash = lookup('myhash', Hash)

Then create a simple ERB template (here: mymodule/templates/myhash.erb)

<% require 'json' -%>
<% require 'pp' -%>
<%= JSON.pretty_generate(@myhash) %>

Then create a file resource using this template:

file { '/tmp/myhash.json':
  ensure  => file,
  content => template('mymodule/myhash.erb'),
  owner   => 'root',
  group   => 'root',
  mode    => '0644',
}

The end result after running Puppet will be nicely formatted JSON:

{
  "mykey": {
    "foo": "one",
    "bar": "two"
  }
}

That's all it takes.

For more details about Puppet see our introductory article.

You are 100% sure that all your Terraform resources are using terraform-provider-azurerm, yet Terraform attempts to download the deprecated "azure" provider:

$ terraform init
Initializing modules...

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/local versions matching "2.2.3"...
- Finding latest version of hashicorp/azure...
- Finding hashicorp/azurerm versions matching "3.17.0"...
- Installing hashicorp/local v2.2.3...
- Installed hashicorp/local v2.2.3 (signed by HashiCorp)
- Installing hashicorp/azurerm v3.17.0...
- Installed hashicorp/azurerm v3.17.0 (signed by HashiCorp)
╷
│ Error: Failed to query available provider packages
│ 
│ Could not retrieve the list of available versions for provider hashicorp/azure: provider registry registry.terraform.io does not have a provider named
│ registry.terraform.io/hashicorp/azure
│ 
│ Did you intend to use terraform-providers/azure? If so, you must specify that source address in each module which requires that provider. To see which
│ modules are currently depending on hashicorp/azure, run the following command:
│     terraform providers
╵

You grep the state file and find no references to the "azure" provider. You assume that the cause is some nested module that depends on it, but no, that's not it. You run "terraform providers" and see that indeed, the "azure" provider is required:

$ terraform providers

Providers required by configuration:
.
├── provider[registry.terraform.io/hashicorp/local] 2.2.3
├── provider[registry.terraform.io/hashicorp/azurerm] 3.17.0
├── provider[registry.terraform.io/hashicorp/azure]
└── module.automation
    ├── provider[registry.terraform.io/hashicorp/azurerm]
    └── provider[registry.terraform.io/hashicorp/local]

Providers required by state:

    provider[registry.terraform.io/hashicorp/azurerm]

    provider[registry.terraform.io/hashicorp/local]

At this point you become desperate: if you did not explicitly define the "azure" provider anywhere, why is it haunting you? Then you notice that there's a typo in one of your resources:

resource "azure_private_dns_zone" "internal_example_org" {
  name                = "internal.example.org"
  resource_group_name = azurerm_resource_group.default.name
}

Indeed: "azure_private_dns_zone" instead of "azurerm_private_dns_zone". That is all it takes for Terraform's dynamic provider dependency loading magic to break things for you in a way that is not immediately obvious.

Hopefully this helps some other poor soul who accidentally types "azure" instead of "azurerm".

Azure Private DNS and Azure VPN Gateway can work together, but the path to getting there is not particularly well lit.
Azure Private DNS and Azure VPN Gateway can work together, but the path to getting there is not particularly well lit. Photo credit: Mo Eid (https://www.pexels.com/it-it/foto/luce-paesaggio-uomo-persona-8347501/)

What is Azure Private DNS?

Azure Private DNS is a DNS service for Azure virtual networks. You can register a private DNS zone to Azure Private DNS and then link that zone with one or more virtual networks. If you enable DNS auto-registration for a virtual network, a new resource (e.g. virtual machines and VPN Gateways) will automatically add its IP address to Azure Private DNS. Resources in the linked virtual networks can use entries in Azure Private DNS to locate each other, instead of relying on IP addresses or the default Azure DNS domain. You can make Azure Private DNS and Azure VPN Gateway work together in several ways, some of which this article will describe.

For example, suppose you register a DNS zone called private.example.org in Azure Private DNS. You then create two virtual machines, server_a and server_b which will automatically add their private IPs to Azure Private DNS as A records. For example:

Now both VMs are able to resolve each other's IP address using their private DNS names. As Azure Private DNS uses the default Azure DNS server (168.63.129.16) transparently, you don't need to modify the virtual network's or virtual machine's DNS settings at all.

What is Azure VPN Gateway

Azure VPN Gateway is designed for encrypting traffic between on-premise resources and Azure. It is basically Azure's multi-protocol VPN server with various high-availability options. We outline some other VPN server options in this blog post.

For the purposes of this blog post we assume that Azure VPN Gateway uses a point-to-site ("P2S") configuration to provide remote users with access to Azure resources through a secure VPN connection. We also assume that VPN Gateway uses the OpenVPN protocol.

Azure Private DNS can also work with OpenVPN and Wireguard. The former is able to push DNS settings to VPN clients, but the latter requires manual client configuration.

Limitations in private DNS

Azure Private DNS and Azure VPN Gateway do not, by default, work together. As we mentioned above, Azure Private DNS uses the default Azure DNS transparently. This has the poorly documented side-effect that Azure VPN Gateway can't route DNS requests from VPN clients to Azure.

VPN Gateway that uses the OpenVPN protocol can push DNS server settings as DHCP options down to VPN clients. It gets the DNS server information from its virtual network's DNS server settings. With default DNS settings the VPN Gateway pushes nothing to the VPN clients.

With vanilla Azure Private DNS you don't have any dedicated DNS servers you could add to the virtual network's DNS settings. Due to this reason VPN clients can't resolve any names from Azure Private DNS without extra services.

Allowing VPN clients to use Azure Private DNS

Azure Private DNS resolver

Azure Private DNS and Azure VPN Gateway require additional services to work together. Azure's solution is called Azure Private DNS Resolver - a managed DNS server cluster. You can add the cluster member IP addresses to the virtual network's DNS server settings. VPN Gateway will (in case of OpenVPN) push those DNS servers to VPN clients. While this approach is decent, it has a pretty high price tag, about $173/month at the time of writing this article.

Custom DNS server cluster with DNS records in Azure Private DNS

You can also to set up two custom DNS servers inside the virtual network. You can then configure those as the virtual network's DNS servers. As they are inside the virtual network, they are able to resolve Azure Private DNS zones. This allows the VPN Gateway to push DNS servers to VPN clients. You do not need to host any DNS records on these instances as the backend would be Azure Private DNS. Essentially your DNS server would be a simple DNS forwarder. A decent Linux-based DNS server cluster you would cost you about $30/month, but has a higher up-front and maintenance cost compared to Azure Private DNS resolver.

Setting up Bind as DNS forwarder is rather easy. This tutorial, though written for Ubuntu 14.04, still works at least on Red Hat Enterprise Linux 8.

Custom DNS server cluster with DNS records

Alternatively you could also skip Azure Private DNS altogether and manage private DNS records on your own DNS servers. The up-front cost is probably highest among all these options, but you'd get maximum flexibility. This option costs about $30/month.

Single DNS server

If you want to save more money you can add two private IP addresses to a single DNS server VM. You can then add those two IPs to the virtual network's DNS server settings. With this approach you could squeeze the price to maybe $25/month. You could have the DNS records in Azure Private DNS or stored inside the DNS server itself.

Related articles

Writing Ansible modules is easier than you may think. Many times it is easier than trying to hack your way through a problem with raw Ansible yaml code.
Writing Ansible modules is easier than you may think. Many times it is easier than trying to hack your way through a problem with raw Ansible yaml code. Photo credit: Harrison Haines (https://www.pexels.com/it-it/foto/internet-connessione-tablet-app-5247937/)

What are Ansible modules?

Ansible modules provide the infrastructure as code building blocks for your Ansible roles, plays and playbooks. Modules manage things such as packages, files and services. The scope of a module is typically quite narrow: it does one thing but attempts to do it well. Writing custom Ansible modules is not particularly difficult. The first step is to solve the problem with raw Python, then you can convert that Python code to an Ansible module

Some problems can't be solved elegantly with existing modules

The default modules get you quite far. However, occasionally you may end up with tasks that are quite difficult to do with Ansible yaml code. In these cases the Ansible code you write becomes very ugly or very difficult to understand, or both. Writing custom Ansible modules can greatly simplify things if this happens.

Example of modifying trivial JSON with raw Ansible

Here is a an example of how to modify a JSON file with Ansible. The file looks like this:

{
  "alt_domains": ["foo.example.org", "bar.example.org"]
}

What Ansible needs to do is add entries to and remove entries from the alt_domains list. The task sounds simple, but the solution in raw Ansible is very ugly:

- name: load current alt_domains file
  include_vars:
    file: "{{ alt_domains_file }}"
    name: alt_domains
- name: set default value for alt_domain_present
  ansible.builtin.set_fact:
    alt_domain_present: false
# The lookup returns data in this format: {'key': 'alt_domains', 'value': ['foobar.example.org', 'foobar.example.org']}
- name: check if current alt_domain already exists in alt_domains
  ansible.builtin.set_fact:
    alt_domain_present: true
  loop: "{{ query('ansible.builtin.dict', alt_domains) }}"
  when: alt_domain in item.value
- name: add alt_domain to alt_domains
  set_fact:
    alt_domains: "{{ alt_domains | default({}) | combine({\"alt_domains\": [\"{{ alt_domain | mandatory }}\"]}, list_merge=\"append\") }}"

Most would probably agree that the code above is already very nasty. That said, it does yet even handle removal of entries from the list or writing the results back to disk. If you had to modify non-trivial JSON files using code like above would make your head explode. There may be other ways to solve this particular problem in raw Ansible. If there are, I was unable to find any easily.

The solution: writing custom Ansible modules

With Ansible you occasionally end up in a hairy situation where you find yourself hacking your way through a problem. It is in those cases where writing a custom Ansible module probably makes most sense. To illustrate the point here's a rudimentary but fully functional implementation for managing alt_domains file such as above:

#!/usr/bin/python
import json

from ansible.module_utils.basic import AnsibleModule

def read_config(module):
  try:
    with open(module.params.get('path'), 'r') as alt_domains_file:
      have = json.load(alt_domains_file)
  except FileNotFoundError:
    have = { "alt_domains": [] }

  return have

def write_config(module, have):
  with open(module.params.get('path'), 'w') as alt_domains_file:
    json.dump(have, alt_domains_file, indent=4, sort_keys=True)
    alt_domains_file.write("\n")

def run_module():
  module_args = dict(
    domain=dict(type='str', required=True),
    path=dict(type='str', required=True),
    state=dict(type='str', require=True, choices=['present', 'absent'])
  )

  result = dict(
    changed=False
  )

  module = AnsibleModule(
    argument_spec=module_args,
    supports_check_mode=True
  )

  if module.check_mode:
      module.exit_json(**result)

  have = read_config(module)
  want = module.params.get('domain')
  state = module.params.get('state')

  if state == 'present' and want in have['alt_domains']:
    result.update(changed=False)
  elif state == 'present' and not (want in have['alt_domains']):
    result.update(changed=True)
    have['alt_domains'].append(want)
    write_config(module, have)
  elif state == 'absent' and want in have['alt_domains']:
    result.update(changed=True)
    have['alt_domains'].remove(want)
    write_config(module, have)
  elif state == 'absent' and not (want in have['alt_domains']):
    result.update(changed=False)
  else:
    module.fail_json(msg="Unhandled exception: domain == %s, state == %s!" % (want, state))

  module.exit_json(**result)

def main():
  run_module()

if __name__ == '__main__':
  main()

This Python code could use some polishing (e.g. proper check_mode support). Yet it is still a lot more readable and understandable than the hackish raw Ansible yaml implementation would be. You also get variable validation for free without having to resort to strategically places ansible.builtin.assert calls.

Summary: do not be afraid of writing Ansible modules

Sometimes you may find yourself in a world of hurt while solving seemingly easy problem with raw Ansible yaml code. This is when you should stop and consider writing and Ansible module instead. Writing a custom Ansible module can make your code much more understandable, flexible and of better quality.

More about Ansible quality assurance from Puppeteers

Open source maturity model from Mindtrek 2022. Applicable to the European Commission's digital sovereignty journey as well. Photo: Samuli Seppänen, 2022
Open source maturity model from Mindtrek 2022. Applicable to the European Commission's digital sovereignty journey as well. Photo: Samuli Seppänen, 2022

What is software sovereignty

Software sovereignty is a subset of digital sovereignty. In essence, digital sovereignty means controlling your data, hardware and software. In Europe digital sovereignty has been driven by the EU. The reason is the reliance on services from big, global US-led vendors such as Amazon, Microsoft and Google. This poses a risk to the EU, just as does reliance on Chinese manufacturing is similarly a risk.

These worries are compounded by the threats to democracy posed by rise of authoritarianism (e.g Russia and China) and other threats to democracy, such as Trump's rise to power and the MAGA movement in the US, and the rise of far-right nationalistic parties in various European countries. Without digital sovereignty in general, or software sovereignty in particular, somebody could "pull the plug" and you would loose access to your own data, hardware and software. Moreover, if you do not control your data, hardware and software, you're capability to innovate is severely hindered.

Open source and software sovereignty

Open source software is a part of the "software" part of digital sovereignty. If you are not able to see and modify the source code for the applications you run, you need to rely on somebody else to do it. If your are using closed source (proprietary) software, the vendor might never implement the features you'd like it to.

Big organizations may be able to get the commercial vendors to customize their software for them. Small actors, such as individuals and small companies, are essentially at the mercy of the vendor. The vendor may or may not implement or may or may not drop features. The vendor may or may not decide to change the prices or the pricing model at will. Software as service (SaaS) is in this regard the worst, as the vendor manages everything, including the configuration of the application. You can typically customize closed-source self-hosted applications to a greated degree than software as a service.

This is where open source comes in. Being open, it allows anyone with the proper skillset to inspect and modify software to suit their needs. This characteristic of open source helps avoid vendor lock-in, even when using commercially supported open source software. Even in the SaaS context you're not out of luck, as you can typically migrate your data from a vendor-managed service to a self-hosted instance.

Perspectives from the Mindtrek 2022 event

In Mindtrek 2022 Miguel Diez Blanco and Gijs Hillenius from the Open Source Programme Office (OSPO) in European commission had a presentation about their open source journey. Timo Väliharju from COSS ("The Finnish Centre for Open Systems and Solutions") gave some perspectives on open source in Europe through his experience in APELL ("Association Professionnelle Européenne du Logiciel Libre"). What follows is a essentially a summary of their analysys of the state of open source and digital sovereignty within the EU and the European member states.

European commission's open source journey

European Commission started their open source journey around year 2000 by using Linux, Apache and PHP for setting up a wiki. Later they set up a lot more wikis. Open source was at that time also used on the infrastructure layer. Later it gradually crept up to the desktop. In 2007 they started to produce open source software themselves (see code.europa.eu). By 2014 the commission had started contributing to other, external open source projects. So, over the years they climbed up the open source maturity level ladder. The Commission's usage of open source software continues to increase. OSPO's goal is to lead by example: by working towards open source and software sovereignty others tends to encourage others to pick it up, too. Something that works for the EU, is likely to work for a national government also.

Culture of sharing and open source

There are about three thousand developers (employees and contractors) in the European Commission. As seems to often be the case, many of these internal teams previously worked in isolation. The isolation is by accident, not by design, but is still harmful for introduction of open source and hence for achieving software sovereignty.

OSPO tackled the problem by encouraging use of an "inner source" by default. The term meant using code developed in-house code when possible. This did, however, require a culture of sharing first. While some software projects were good to share as-is, some had issues that the authors had to resolve first. Some projects were not useful outside of the team that had developed them, so the authors decided to keep them private. The cultural change took a couple of years. OSPO encouraged the change by providing really nice tools for those teams that decide to join. That is, they preferred a carrot to a stick.

Outreach to communities

Along with their internal open source journey, OSPO have also reached out to open source communities.They fund public bug bounty programs and organize hackatons for important open source projects. The hackathons help gauge the maturity of those open source projects. They also help OSPO find ways to help them become more mature.

OSPO also holds physical and virtual meetings between presentatives of European countries once a year. The goal of these meetings is to increase open source usage and software sovereignty with data-based decisions.

Improving security of open source software

OSPO has gone beyond bug bountries in their attempts to improve the security of open source software. FOSSEPS stands for "Free and Open Source Software Solutions for European Public Services". One of its key objectives has been to improve the security of open source software use the the Commission. OSPO achieved the goal by building an inventory of software used by the EU. It used the inventory to figure out what software required an audit. Once they had finished auditing, they fixed thesecurity issues they had identified.

The journey to software sovereignty continues

The European Commission's open source work is still ongoing. In the member state the status of digital sovereignty vary a lot. Some countries like France and Germany put a lot of emphasis on open source in their policies, but funding may at times be a bit thin. Other countries, for example Finland and Denmark consider open source as "nice to have" instead of "must have". On the commercial front the challenge is that European open source companies tend to be small. This is the reason why one of APELL's goals is help them work together more efficiently.

Open source at Puppeteers

We, the Puppeteers, are an open source company. We do Cloud automation with infrastructure as code using open source tools such as Puppet, Terraform, Ansible, Packer and Podman. The majority of the code we write is available in GitHub and in various upstream open source projects. We provider our clients with high quality peer reviewed code and help them avoid any form of vendor lock-in.

If you need help with your Cloud automation project do not hesitate to contact us!

menucross-circle