Introduction
Amazon RDS is a relational database service available on Amazon Web Services. It is essentially a managed database server with a volume for the data. Both cost money. Having a large dataset is not the only reason have big volume: the volume size determines the number of I/O operations (IOPS) the volume can do. If you have high load on your RDS instance you may very well run out of IOPS which can result in a disaster. Scaling up the volume size is easy to avoid such disasters, but it is much more difficult to reduce RDS storage size. This is because downsizing RDS storage size is not officially supported by AWS. If the load on your RDS instance goes down (e.g. due to refactoring), you may end up having a database with an oversized volume that costs lots of money for no reason.
Reduce RDS storage manually
The basic process for reducing RDS storage is the following:
- Create a new RDS instance with a smaller volume
- Stop all applications that use the old RDS instance
- Dump all databases from the old RDS instance
- Restore all databases to the new RDS instance
- Update all applications to use the new RDS instance
- Hope for the best
This is not a particularly fast nor easy process. If you want to try it out yourself, please refer to this article.
That said, we recommend automating the process. The easiest way might be to use our rds-resize.py script. For details see below.
Reduce RDS storage in an automated way
When we had to resize RDS one of the requirements was that the RDS downsizing process must be testable outside of a production environment. The process also had to be reliable and had to produce consistent results. Therefore our only option was to automate the process, which we would have done anyways given our policy of always building infrastructure as code.
So, to reduce RDS storage size we wrote the rds-resize script. It automates all the steps above except those that start and stop applications that use RDS. It has some rough edges still and only supports PostgreSQL, but it seems to do its job very reliably. We base this claim on a dozen or so RDS downscalings done on live systems, some of which are critical production systems. On a high level the script takes these steps:
- Verify that the databases are not in use. If they are, exit without any action.
- Create new RDS instance. This step may be skipped when testing dump and restore procedures.
- Dump globals (e.g. roles) from the old RDS instance
- Restore globals to the new RDS instance
- Restore credentials on the new RDS instance. They can't be dumped and restored due to security considerations.
- Dump databases from the old RDS instance
- Restore databases to the new RDS instance
- Sanity check the new database. This includes table and session count comparison for old and new databases. If the counts do not match then something might be wrong.
The rds-resize Git repository includes a Podman container configuration. It allows you to launch the RDS resizing environment with minimal effort, assuming you have Podman installed. This is the case with RHEL 8 and 9 as well as Fedora. Docker will likely work ok as a Podman substitute, but we have not tested it.
If you encounter any issues with rds-resize.py please open a GitHub issue or better yet, create pull request. Happy resizing!