We recently hit an interesting issue with an "almost in production" Keycloak high-availability clustered domain mode setup that was behind an Amazon EC2 load balancer with sticky sessions enabled. The symptom was that changes from the domain controller did not propagate to the slave. This included:
- Changes made to realms and clients using the Admin REST API
- User sessions
This lead us into a quest to truly understand how Infinispan caches in the Wildfly application server and consequently in Keycloak work. Before you read this article you should probably read all of these:
- Keycloak Operating Modes
- Keycloak Clustering
- Keycloak Server Change Configuration
- Wildfly High-Availability Guide
That said, after reading all of the above you're probably still thinking "Interesting... and how does Infinispan work and how is it integrated to Wildfly and Keycloak?". Hopefully this article will be able to answer that question in some degree.
As discussed in the official documentation Keycloak has two types of caches. The first category of cache caches the contents of the Keycloak database (e.g. realms, clients and users). This is done to improve response times and to reduce database load. These local caches are not replicated across Keycloak instances. However, when a Keycloak resource such as a client is changed on a node the invalidation cache is updated. An invalidation cache is a type clustered cache, meaning its content will get replicated to other nodes in the cluster, essentially sending them a message saying that a certain piece of data has changed and that it needs to be purged from local caches and memory. The invalidation message does not include the actual resource to be purged (e.g. user), just the information about what has changed. This way sensitive data need not be replicated across the cluster.
The second category of Infinispan caches stores user sessions, offline tokens and login failures. The contents of these caches is not stored in a database, but may be replicated to all nodes in a cluster, or distributed to some of them. The data in them is relatively short-lived and the consequences for losing that data is not horrible. Maybe some users have to re-authenticate, but things will otherwise be ok.
To complicate things a bit more Infinispan uses the JGroups library for initial (cluster) member discovery and for providing reliable data transport between cluster members. In other words JGroups is responsible for joining and removing nodes from the cluster and for replicating data across the network. These two resposibilities are joined into a JGroup stack, which is then configured as a transport for Infinispan. This is somewhat confusing, as what Infinispan configuration calls a "transport" is actually a combination of "transport" and "initial membership discovery" in JGroup.
Infinispan transports (JGroup stacks) are configured at the cache-container level. This means that several caches often share the same transport. In Keycloak you have cache-containers that include several local caches and cache-containers that include several distributes caches.
This is where things start to get interesting. In order for the Infinispan caching to work you need to pick a JGroup stack that your environment supports. For example Amazon EC2 does not support UDP multicasting, so node discovery using (default) "udp" stack will not work. Instead, you could use the "ec2" stack (see here) which uses TCP for transport and S3_PING for discovery. Or you could use TCP for transport and JDBC_PING for discovery.
In our particular case the problem was that we had configured the cluster members statically (no autodiscovery), but the transport protocol was not overriden: it was still the default (UDP multicast) which does not work in EC2. By changing to TCP transport and JDBC_PING-based discovery the problem went away.
In Keycloak/Wildfly Infinispan is just one subsystem among many to configure. The best documentation for all the Infinispan configuration options is in the Wildfly Full Model Reference. The structure of the configuration options maps directly to paths you'd give to Wildfly CLI to modify Wildfly XML configuration. As mentioned all over the web you should never modify the Keycloak/Wildfly configuration XML files directly or your changes might get overwritten.
Btw. You can dig up lots of useful information about Keycloak caches and other topics from the Keycloak community resources, in particular the mailing list archives.