If you’re at all like me, you every now and then find yourself thrown out of your comfort zone, when you should actually be in it.
The pattern usually goes something like this:
- It’s something simple. I’ll fix it in a couple of minutes and document it for others. I know my stuff.
- Hmm, this seems to be a bit stubborn. I need to take a closer look.
- Still doesn’t work. I need to run this in a debugging set up and break it into pieces, to see where the problem is.
- I don’t understand. Everything should be ok. It should just work. I’ve read everything about this, googled and used every debugging method and tool under the sun. It must be a bug.
- (After hours or days of banging head to the wall). This is a simple configuration error. I didn’t know my stuff.
This happened to me (again) when trying to solve why Apache Mellon refused to work and entered a redirect loop. Now, this is not something unheard of with SAML setups, but with Mellon, I had never encountered this. And I’ve set up Mellon many, many times.
When you set up mellon to work with your IdP, you basically just export the IdP metadata, the SP metadata, the SP certificate and private key from Keycloak. Yes, we’re talking about Keycloak and Mellon here. You then place the files in some directory and create a mellon configuration file where those files are referred to. Here’s a simple example:
<Location />
Require valid-user
AuthType Mellon
MellonEnable auth
MellonSPMetadataFile /etc/httpd/mellon/sp-metadata.xml
MellonIdPMetadataFile /etc/httpd/mellon/idp-metadata.xml
MellonSPPrivateKeyFile /etc/httpd/mellon/client-private-key.pem
MellonSPCertFile /etc/httpd/mellon/client-cert.pem
</Location>
Here the MellonSPMetadataFile directive points to an SP metadata file. Let’s take a look:
<EntityDescriptor xmlns="urn:oasis:names:tc:SAML:2.0:metadata" entityID="mellon.example.com">
<SPSSODescriptor AuthnRequestsSigned="true" WantAssertionsSigned="false"
protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol urn:oasis:names:tc:SAML:1.1:protocol http://schemas.xmlsoap.org/ws/2003/07/secext">
<KeyDescriptor use="signing">
<dsig:KeyInfo xmlns:dsig="http://www.w3.org/2000/09/xmldsig#">
<dsig:X509Data>
<dsig:X509Certificate><cert data></dsig:X509Certificate>
</dsig:X509Data>
</dsig:KeyInfo>
</KeyDescriptor>
<SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://mellon.example.com/mellon/myapp/postResponse"/>
<NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:transient
</NameIDFormat>
<AssertionConsumerService
Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://mellon.example.com/mellon/myapp/postResponse"
index="1" isDefault="true" />
</SPSSODescriptor>
</EntityDescriptor>
What this SPSSODescriptor tells us, is that this particular SP has an entityID of 'mellon.example.com' and it’s AssertionConsumerService endpoint is found in https://mellon.example.com/mellon/myapp/postResponse, amongst other more or less esoteric things. The AssertionConsumerService endpoint wants a HTTP POST request or will feel bad. Cool.
This is so simple I do it with my left hand while playing Resident Evil Biohazard with the rest of my body.
Or don't. Do you notice where I screwed up? Hint: Half of it is implicit.
The default value for MellonEndPointPath directive is '/mellon/'. It’s not in the mellon configuration because it’s, er, the default, but you can see it if you run Mellon with diagnostics enabled. (It's also in the documentation, but who reads those). What I did here was to instruct my IdP to do a POST binding to the supposed endpoint in:
https://mellon.example.com/mellon/myapp/postResponse
But this is not the endpoint! Mellon has set it to be:
https://mellon.example.com/mellon/postResponse
What will happen is that Keycloak will authenticate a session and happily redirect to the endpoint given to it. Mellon will see a POST to mellon/myapp/postResponse, refuse to co-operate and redirect to Keycloak. Keycloak will see the incoming GET and after some introspection, find out there’s already a session and redirect to Mellon. Now we have a loop. The obvious fix is to pay attention to the SP metadata locations and fix them. Or tune MellonEndPointPath to work with the intentions.
The take away here is that always triple check the most basic things about your Mellon setup. And do this when you are sober, have time to focus and feel otherwise energetic. My experience is that Mellon with Keycloak works quite well, including authorization. If you have weird problems, chances are it’s something simple and you didn’t pay attention. But that’s life. Just pay attention.
Another take away is to use a configuration management system, like Puppet, to enforce a known working configuration. Manual set ups for complex systems are slow and prone to all kinds of creative human errors. We don't wanna be creative, we wanna be industrial. You’re wasting your time and probably someone’s money. It’s much better to develop and test a working configuration outside of production and then distribute and enforce it across the universe.
As always, if you have problems, feel free to contact us. If your problems are not in computing domain, we possibly cannot help. Sorry.