The first time I inherited a system with hard-coded database passwords in environment variables, I found the same credential pasted in four places: a Terraform variable, a Lambda config, a CI secret, and, naturally, a Slack message from 2021. Rotating it meant touching all four without breaking anything. We didn't rotate it for two years.

AWS Secrets Manager exists precisely to kill that pattern. But the part people get wrong is rotation: dropping a secret into Secrets Manager doesn't rotate it. You have to understand the four-step rotation contract, and that's what trips most teams up.

What rotation actually does

Secrets Manager rotation is driven by a Lambda function that Secrets Manager invokes on a schedule, passing a Step in the event. Your function must implement four discrete steps, each idempotent:

  1. createSecret, generate a new credential and store it as the AWSPENDING version.
  2. setSecret, apply the pending credential to the actual resource (e.g. ALTER USER ... PASSWORD).
  3. testSecret, verify the new credential works by actually using it.
  4. finishSecret, move the AWSCURRENT label to the pending version.

The two-version model (AWSCURRENT and AWSPENDING) is the whole trick: the old credential keeps working until finishSecret flips the label, so there's no window where consumers are locked out.

The single-user vs two-user trade-off

For RDS and a few other databases, AWS ships managed rotation Lambdas with two strategies, and choosing wrong causes outages:

StrategyHow it worksWatch out
Single userRotates the password of one user in placeBrief window where open connections may hold the old password; fine for low-churn clients with retries
Alternating usersMaintains two users, rotates the inactive one, then switchesZero-downtime, but needs a superuser/clone grant and double the users to manage
If your app pools connections and can't tolerate a single failed auth, use the alternating-users strategy. The single-user one is simpler but assumes your clients reconnect cleanly.

Wiring it up with the CLI

For an RDS Postgres credential, you create the secret, then attach a rotation schedule pointing at the managed Lambda. The RotationRules support a cron-like ScheduleExpression as well as a simple day count:

aws secretsmanager create-secret \
  --name prod/checkout/db \
  --secret-string '{"username":"app","password":"REPLACE_ME","host":"checkout.abc123.us-east-1.rds.amazonaws.com","port":5432,"dbname":"checkout"}'

aws secretsmanager rotate-secret \
  --secret-id prod/checkout/db \
  --rotation-lambda-arn arn:aws:lambda:us-east-1:111122223333:function:SecretsManagerRDSPostgreSQLRotationSingleUser \
  --rotation-rules '{"ScheduleExpression":"rate(30 days)"}'

That rotate-secret call triggers an immediate rotation as well as setting the recurring schedule, so it doubles as your verification that the whole pipeline works end to end.

Retrieving it from code, and caching

The mistake I see most: calling get_secret_value on every request. That adds latency and can hit API throttling. Use the AWS-provided caching library so you fetch once and refresh in the background:

from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
import boto3, json

client = boto3.client("secretsmanager")
cache = SecretCache(
    config=SecretCacheConfig(secret_refresh_interval=300),
    client=client,
)

def get_db_creds():
    raw = cache.get_secret_string("prod/checkout/db")
    return json.loads(raw)

The cache transparently picks up the new AWSCURRENT version on its next refresh, so rotation and the application stay decoupled. Just make sure the refresh interval is shorter than any connection's lifetime so reconnects grab fresh credentials.

Locking down access

Rotation only helps if the blast radius is small. A few non-negotiables I enforce:

  • Encrypt with a customer-managed KMS key, not the default aws/secretsmanager key, so you control the key policy and audit decrypts separately.
  • Scope IAM to specific secret ARNs with a path prefix like prod/checkout/*, never secretsmanager:GetSecretValue on *.
  • Turn on CloudTrail data events for Secrets Manager so every GetSecretValue is logged.
  • Use VPC endpoints (com.amazonaws.region.secretsmanager) so retrieval never leaves the AWS network.

Takeaways

  • Storing a secret isn't rotating it, rotation is a four-step Lambda contract (create, set, test, finish) built on the AWSCURRENT/AWSPENDING version labels.
  • Pick single-user rotation for simplicity, alternating-users for true zero-downtime with pooled connections.
  • Cache retrieved secrets with the official caching library; never call GetSecretValue per request.
  • Shrink blast radius with customer-managed KMS keys, ARN-scoped IAM, CloudTrail data events, and VPC endpoints.