When I joined a four-person startup, the existing VPC had been copied from a Fortune 500 reference architecture: six subnet tiers, transit gateways, three NAT gateways at $32/month each plus data processing, and not one person who could explain why. We were paying enterprise networking tax to run a single API and a database. Small teams need a VPC that's secure, debuggable by one tired engineer at 2am, and cheap. Here's the shape I keep coming back to.

Start from what you actually need

A typical small-team workload is: some compute that serves traffic, a database that must never be public, and outbound internet access for the compute to pull packages and call APIs. That maps to a deliberately boring layout:

  • One VPC, e.g. 10.0.0.0/16, far more addresses than you'll ever use, which is the point.
  • Public subnets for the load balancer and NAT only.
  • Private subnets for compute and the database.
  • Two Availability Zones. Not one (no failover), not three (you're not at that scale, and NAT-per-AZ gets expensive).
Resist the urge to subnet for problems you don't have. You can always add subnets later; you cannot easily shrink an over-engineered VPC someone has come to depend on.

The NAT gateway cost trap

NAT Gateways are the most common silent cost in a small VPC: an hourly charge plus a per-GB data-processing charge. For high availability you'd run one per AZ, but at small scale a single NAT in one AZ is often the right trade, you accept that an AZ failure breaks outbound traffic temporarily, in exchange for halving the bill. Better still, eliminate NAT traffic entirely for AWS service calls using VPC endpoints, which keep traffic on the AWS backbone:

  • Gateway endpoints for S3 and DynamoDB are free, always add them.
  • Interface endpoints (for ECR, Secrets Manager, etc.) cost per-hour but can be cheaper than routing that traffic through NAT.

A minimal, readable Terraform

I keep the whole network in one small module so any engineer can read it top to bottom. The free S3 gateway endpoint is the highest-value line here:

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

# Free: keeps S3 traffic off the NAT gateway entirely.
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = aws_route_table.private[*].id
}

Security groups over NACLs

For a small team, do nearly all your access control with security groups and leave NACLs at their default allow-all. Security groups are stateful, reference each other by ID, and are far easier to reason about. The pattern that scales well: the database SG allows inbound on 5432 only from the app SG, by ID, not by CIDR.

resource "aws_security_group_rule" "db_from_app" {
  type                     = "ingress"
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
  security_group_id        = aws_security_group.db.id
  source_security_group_id = aws_security_group.app.id
}

Now scaling the app tier never requires touching the database's network rules, and there are no IP ranges to keep in sync.

Takeaways

  • One VPC, two AZs, public/private split, add complexity only when a real requirement forces it.
  • NAT Gateways are the silent cost; one NAT at small scale is a reasonable trade, and free S3/DynamoDB gateway endpoints cut its traffic.
  • Control access with stateful security groups referencing each other by ID; leave NACLs at default.
  • Keep the whole network in one readable Terraform module so a single on-call engineer can understand it.