The hidden cost of NAT Gateways (and how to cut it)
NAT Gateways quietly bleed money on data processing. Here’s how to find and fix it.
The line item that made me do a double-take on our bill wasn't EC2 or RDS, it was "NAT Gateway" at over $4,000 a month. We had three of them (one per AZ, as the docs suggest), and the data processing charges dwarfed the hourly cost. NAT Gateways are one of the sneakiest costs in AWS because they bill twice: an hourly rate and a per-GB charge on everything that flows through them.
Here's where that money actually goes and the three changes that cut ours by more than half.
How a NAT Gateway bills
A NAT Gateway exists so resources in private subnets can reach the internet (and AWS public endpoints) without a public IP. It charges on two axes:
- Hourly, about $0.045/hour, roughly $32/month per gateway, regardless of traffic.
- Data processing, about $0.045 per GB that passes through it, in addition to normal data transfer charges.
That per-GB processing fee is the killer. Every gigabyte your private instances pull from S3, ECR, or any AWS service over the public path gets the NAT processing charge stacked on top of regular transfer. At terabytes per month, the processing fee alone runs into thousands.
Cut #1: VPC endpoints for AWS traffic
The biggest win is realizing most "internet" traffic from private subnets isn't internet at all, it's traffic to AWS services like S3, ECR, DynamoDB, and CloudWatch. Route that through VPC endpoints and it never touches the NAT Gateway, so it skips the processing fee entirely.
There are two kinds, and the distinction matters for cost:
| Type | Used by | Cost |
|---|---|---|
| Gateway endpoint | S3, DynamoDB | Free |
| Interface endpoint (PrivateLink) | ECR, CloudWatch, SQS, etc. | ~$0.01/hr + ~$0.01/GB |
Gateway endpoints for S3 and DynamoDB are free, there is no reason not to add them. They removed the single largest chunk of our NAT traffic (ECR image pulls and S3 reads from a data pipeline).
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = [aws_route_table.private.id]
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
private_dns_enabled = true
}
Interface endpoints cost a little, so do the math: if you pull many GB of container images per day, the ECR interface endpoint is far cheaper than NAT processing. For light traffic it may not pay off.
Cut #2: right-size your AZ redundancy
The standard advice is one NAT Gateway per AZ for high availability, and for production that's correct, because a single NAT in one AZ is a cross-AZ failure point (and cross-AZ traffic to reach it also costs extra). But for dev and staging, three gateways is overkill. We collapsed non-prod to a single NAT Gateway and accepted the reduced redundancy, removing two gateways' worth of hourly and processing charges per non-prod VPC.
Cut #3: find what's actually flowing
You can't cut what you can't see. I enabled VPC Flow Logs and queried them to find the top talkers through the NAT, which destinations and which instances drove the data processing charges.
aws ec2 create-flow-logs \
--resource-type VPC \
--resource-ids vpc-0abc123 \
--traffic-type ALL \
--log-destination-type cloud-watch-logs \
--log-group-name /vpc/flowlogs \
--deliver-logs-permission-arn arn:aws:iam::123456789012:role/flowlogs
Flow Logs revealed a surprise: a logging agent shipping gigabytes to a third-party SaaS endpoint over the public internet. We couldn't VPC-endpoint that (it's external), but we batched and compressed the payloads, cutting that traffic by 70%. The lesson is that not all NAT traffic is AWS-bound, measure before assuming endpoints will fix everything.
Takeaways
- NAT Gateways bill hourly and per GB processed, the per-GB fee is usually what dominates the bill.
- Add free S3 and DynamoDB gateway endpoints immediately; use interface endpoints for ECR/CloudWatch when traffic justifies the hourly fee.
- Keep one NAT per AZ in prod for HA, but collapse non-prod to a single gateway to cut redundant cost.
- Turn on VPC Flow Logs to find the real top talkers, some NAT traffic is external and needs compression, not endpoints.