Downsize AWS RDS Volume

May 8, 2023 

It is trivial to downsize an AWS RDS instance. It is equally trivial to upsize AWS RDS volumes. However, to downsize AWS RDS volume you need to jump through lots of hoops.

So, why would you ever want to downsize an AWS RDS volume? One reason might be that you increased the volume size simply to get more burst credits for the volume. Then you did refactoring that reduced the need to have that many credits and want to save money. Or maybe you genuinely had a huge amount of data on your RDS instance volume, removed some of it but got stuck with a huge volume.

One way to shrink an RDS volume is by using Boto3 in python to spin up a new RDS instance. Basically you create a copy of the RDS instance but use a smaller volume on the new one. Here is an example of using Python to shrink and RDS instance that uses postgres as a backend.

Initial Setup

This Python script uses a bunch of libraries which are imported at the top of the python script. You can store all the configuration items in some yaml file but be sure to .gitignore it as it will have some secret information in it.

from subprocess import Popen, PIPE
import json
import yaml
import psycopg2
import boto3
import botocore.exceptions
import sys
import os
import logging

def run_process(cmd: list[str], use_shell: bool = False):
    logging.debug(f'launching process with cmd: {cmd}')
    if use_shell:
        cmd = ' '.join(cmd)
    p = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=use_shell)
    stdout, stderr = p.communicate()
    if stdout:
        logging.info(stdout)
    if stderr:
        logging.error(stderr)
    exit_code = p.wait()
    if exit_code:
        logging.warning(f'process command {cmd} exited with {exit_code}')
    logging.debug('process closed')

The run_process function is just a helper function to reuse some code that will handle processing a psql command and logging it with the logging library in python.

Dump Databases

First step will be to manually stop all services with a connection to the databases. Then the script connects to all databases and dumps all data that is of interest to you. The dumps will be restored later on the new RDS instance we will be creating.

You can check if a database is in use with something like:

def check_db_in_use(db_name: str) -> bool:
    # Connect to the database
    conn = psycopg2.connect(
        host=master_rds_address,
        database='postgres',
        user=psql_admin,
        password=psql_password
    )

    cur = conn.cursor()

    # Execute a query to check if the database is in use
    cur.execute(f"""
        SELECT count(*) FROM pg_stat_activity
        WHERE datname = '{db_name}' AND state = 'active';
    """)
    result = cur.fetchone()[0]

    # Close the cursor and connection
    cur.close()
    conn.close()

    # Check the result
    if result > 0:
        return True
    else:
        return False

The command to dump a psql database is something like pg_dump -U {admin} -h {host} -F c -f {dump_file} {db_name}. The -F c is telling it to dump in a custom format. You can dump the database with a function like so:

# Example of a database dump function
def dump_db(db_name: str, dump_file: str = ''):
    if not dump_file:
        dump_file = f'./dump/{db_name}.dump'
    cmd = [
        'pg_dump', '-U', psql_admin, '-h',master_rds_address, '-F', 'c', '-f', dump_file, db_name
    ]
    run_process(cmd)

You will also need to get the globals which have information like what roles get what permissions. The command generally looks like pg_dumpall -U {admin} -h {host} -f {dump_file} --no-role-passwords -g. The --no-role-passwords is to get around permissions with dumping account passwords, which will need to be set later, and the -g is global. The code would look something like:

def dump_globals(dump_file: str = './dump/globals.sql'):
    cmd = [
        'pg_dumpall', '-U', psql_admin, '-h', master_rds_address, '-f', dump_file, '--no-role-passwords', '-g'
    ]
    run_process(cmd)

Downsize AWS RDS volume

This function is in charge of creating a new RDS instance based on the one we want to shrink. Technically speaking this is not downsizing, because that's at this point impossible. You will notice a dictionary being used called config which can be loaded elsewhere in the program and be the object of the config file used.

The general logic is as follows:

  • Get master RDS instance we are cloning with rds.describe_db_instances as described here
  • Create a new dictionary with all the properties you want to copy
  • Change the allocated_storage and the max_allocated_storage to the smaller target sizes wanted
  • Create the new RDS Instance with rds.create_db_instance as shown here
  • Get the new instance address and return it to be referenced later
def create_rds(db_identifier: str) -> str:
    global master_rds_address
    allocated_storage = config['allocated_storage']
    max_allocated_storage = config['max_allocated_storage']
    master_db_identifier = config['master_rds_identifier']
    result = rds.describe_db_instances(DBInstanceIdentifier=master_db_identifier)
    master_db_stats = result['DBInstances'][0]
    master_rds_address = master_db_stats['Endpoint']['Address']
    vpc_security_group_ids = []
    for group in master_db_stats['VpcSecurityGroups']:
        vpc_security_group_ids.append(group['VpcSecurityGroupId'])
    db_subnet_group_name = master_db_stats['DBSubnetGroup']['DBSubnetGroupName']
    new_db_stats = {
        'DBName': master_db_stats['DBName'],
        'AllocatedStorage': allocated_storage,
        'MaxAllocatedStorage': max_allocated_storage,
        'DBInstanceIdentifier': db_identifier,
        'DBInstanceClass': master_db_stats['DBInstanceClass'],
        'MasterUserPassword': psql_password,
        'DBSubnetGroupName': db_subnet_group_name,
        'Engine': master_db_stats['Engine'],
        'EngineVersion': master_db_stats['EngineVersion'],
        'MasterUsername': master_db_stats['MasterUsername'],
        'AvailabilityZone': master_db_stats['AvailabilityZone'],
        'PreferredMaintenanceWindow': master_db_stats['PreferredMaintenanceWindow'],
        'BackupRetentionPeriod': master_db_stats['BackupRetentionPeriod'],
        'VpcSecurityGroupIds': vpc_security_group_ids,
        'AutoMinorVersionUpgrade': master_db_stats['AutoMinorVersionUpgrade'],
        'CopyTagsToSnapshot': master_db_stats['CopyTagsToSnapshot'],
        'DeletionProtection': master_db_stats['DeletionProtection'],
        'EnableCloudwatchLogsExports': master_db_stats['EnabledCloudwatchLogsExports']
    }
    rds.create_db_instance(**new_db_stats)

    # Wait for the instance to become available
    waiter = rds.get_waiter('db_instance_available')
    waiter.wait(DBInstanceIdentifier=db_identifier)
    result = rds.describe_db_instances(DBInstanceIdentifier=db_identifier)
    new_db_stats = result['DBInstances'][0]
    return new_db_stats['Endpoint']['Address']

Restore Globals

Next we run the command: psql -U {admin} -h {host} -d postgres < {dump_file} to restore from the dump file created above. Notice the run_process has the shell=False param being set to True. This is because of the < causing an issue with the normal use of the Popen function.

Code should look something like:

def restore_globals(new_rds_address: str, restore_file: str = './dump/globals.sql'):
    if not os.path.exists(restore_file):
        logging.error(f'file {restore_file} does not exist!')
        return
    cmd = [
        'psql', '-U', psql_admin, '-h', new_rds_address,
        '-d', 'postgres', '<', restore_file
    ]
    run_process(cmd, True)

Restore Databases

Then we use the command pg_restore -U {admin} -h {host} -F c --create -d {database} {dump_file} to restore from the dump file created before.

The code:

def restore_db(db_name: str, new_rds_address: str, restore_file: str = ''):
    if not restore_file:
        restore_file = f'./dump/{db_name}.dump'
    if not os.path.exists(restore_file):
        return
    cmd_restore_db = [
        'pg_restore', '-U', psql_admin, '-h', new_rds_address, '-F', 'c', '--create', '-d', 'postgres', restore_file
    ]
    run_process(cmd_restore_db)

Restore Service Passwords

The last couple steps is to restore the passwords since the globals were dumped without passwords and to allow the users to login. The LOGIN privilege is removed by default for security reasons and needs to be re-enabled. The code:

def restore_password(user: str, password: str, new_rds_address: str):
    sql_reset = f"ALTER USER {user} WITH PASSWORD '{password}';"
    sql_login = f"ALTER ROLE {user} LOGIN;"
    cmd = [
        'psql', '-U', psql_admin, '-h', new_rds_address,
        '-d', 'postgres', '-c', sql_reset
    ]
    cmd2 = [
        'psql', '-U', psql_admin, '-h', new_rds_address,
        '-d', 'postgres', '-c', sql_login
    ]
    run_process(cmd)
    run_process(cmd2)

Downsize AWS RDS volume: final thoughts

After the changes are made you would want to test the data restored properly and test services using the new RDS instance. If everything is good then eventually the old RDS can be retired.

If you're interested in the above script let us know and we'll be happy to push it to our GitHub.

Samuli Seppänen
Ricky Cousins
Author archive
menucross-circle