As mentioned in our earlier blog post we created an "email to XMPP bridge" called milter-xmpp for forwarding alert emails from monitoring systems to an XMPP chatroom. It was known that if the connection to the XMPP server ever died the bridge would stop working, but we wanted to know how fast that would happen. It took some days, but eventually the connection died.
The fix ended up being quite trivial, but we needed a way to break the connection. Fortunately the good folks at StackExchange already had a good solution: use the "ss" tool ("another utility to investigate sockets") which provides output similar to netstat:
$ ss -n|grep -E '(NetidState|5222)' NetidState Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 29011 0 18.104.22.168:38702 22.214.171.124:5222
What makes "ss" interesting is that you can also kill those connections (as root). For example, to kill the above connection you'd use:
$ ss -n -K dst 126.96.36.199 dport = 5222
If the command succeed it will print out the matching socket, i.e. give the same output as the above "ss -n" command would.
With "ss" you can easily make sure that your code is actually able to recover from broken connections without having to wait for days or weeks for a real failure.
The fix we made to milter-xmpp is unlikely handle corner-cases such as the alertbot dropping from the alerts chatroom, but it probably covers 90% or more of the issues that happen in real life.