Managing Terraform state with S3 and DynamoDB locking, The Cloud Ledger

The first time two engineers ran terraform apply against the same workspace within a minute of each other, we ended up with a half-applied plan and a state file that disagreed with reality. I spent that afternoon manually reconciling resources by hand. That incident is why I no longer ship any Terraform project without a remote backend and a lock table from day one.

This is the setup I reach for on every AWS account now: an S3 bucket for the state file and a DynamoDB table for locking. It is boring, cheap, and it has never failed me since.

Why local state breaks down fast

Terraform's default backend writes terraform.tfstate to disk. That works for a solo experiment, but the moment a second person or a CI runner enters the picture you hit three problems: the state isn't shared, there is no locking so concurrent applies corrupt it, and secrets in the state sit unencrypted in your repo or laptop. A remote backend on S3 solves all three.

State is the single source of truth Terraform uses to map your configuration to real infrastructure. Treat it with the same care you'd give a production database.

Provisioning the backend resources

There is a bootstrapping chicken-and-egg here: you need the bucket and table before Terraform can use them as a backend. I keep this in a small separate stack with a local backend, apply it once, then never touch it.

resource "aws_s3_bucket" "tf_state" {
  bucket = "acme-tf-state-prod"
}

resource "aws_s3_bucket_versioning" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_dynamodb_table" "tf_lock" {
  name         = "acme-tf-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Versioning is the part people skip and regret. If a bad apply mangles state, you can roll back to a prior object version in seconds. PAY_PER_REQUEST billing on the lock table costs effectively nothing for normal usage since locks are tiny and infrequent.

Wiring the backend into your real stack

In your actual infrastructure project, point the backend at those resources. Note the backend block cannot use variables, so the values are hardcoded or supplied via -backend-config at init time.

terraform {
  backend "s3" {
    bucket         = "acme-tf-state-prod"
    key            = "network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "acme-tf-lock"
    encrypt        = true
  }
}

The key is where I enforce separation. Each logical stack gets its own key prefix so a network change can't lock or clobber the database stack:

network/terraform.tfstate
data/terraform.tfstate
app/terraform.tfstate

How locking actually behaves

When you run a command that mutates state, Terraform writes a row to the DynamoDB table keyed on LockID. A second apply sees that row and refuses to proceed:

$ terraform apply
Error: Error acquiring the state lock
Lock Info:
  ID:        9f3c1a2b-...
  Operation: OperationTypeApply
  Who:       jenkins@build-07
  Created:   2026-06-24 14:02:11 UTC

If a runner dies mid-apply and leaves a stale lock, terraform force-unlock 9f3c1a2b-... clears it, but only after you've confirmed no apply is actually running. I treat force-unlock as a break-glass action, never a reflex.

Locking down access

The state bucket holds plaintext secrets (RDS passwords, generated keys) inside the state JSON. Block all public access at the account level, scope the IAM policy so CI can read/write only its own key prefix, and require s3:GetObject through KMS. A least-privilege policy for a CI role looks like granting s3:GetObject and s3:PutObject on arn:aws:s3:::acme-tf-state-prod/app/* plus dynamodb:GetItem, PutItem, and DeleteItem on the lock table.

Takeaways

Bootstrap the S3 bucket and DynamoDB lock table in a separate one-time stack to avoid the chicken-and-egg problem.
Always enable bucket versioning and KMS encryption, versioning is your rollback path when state goes bad.
Give each logical stack its own state key so concurrent work on different components never collides.
Treat force-unlock as break-glass only, and scope IAM access to the exact key prefix each role needs.