For years my service-to-service security model was basically "the VPC is the perimeter." Inside the VPC, security groups were broad, mutual TLS was aspirational, and any compromised pod could happily talk to the payments service on its internal port. Security review after security review flagged the same thing: we had network reachability, not authorization.

VPC Lattice is the first AWS primitive that let me push identity into the network layer without bolting a service mesh onto every cluster. Here's how I've been using it to move toward zero trust.

What Lattice actually changes

VPC Lattice introduces a service network as a first-class object. You register services into it, associate VPCs and accounts, and Lattice handles the routing, load balancing, and crucially the auth between callers and services, independent of IP-level reachability.

The shift in mindset:

Old modelLattice model
Security group allows CIDR/SGAuth policy allows IAM principal
Reachability == permissionReachability != permission
VPC peering / TGW meshService network association
mTLS managed per appIdentity carried by SigV4

The key insight is that a request now carries a signed IAM identity, and the service can decide whether that identity is allowed, regardless of where the packet originated.

Wiring up a service network

You create the network, attach an auth type of AWS_IAM, then associate your VPCs:

resource "aws_vpclattice_service_network" "core" {
  name      = "core-svc-net"
  auth_type = "AWS_IAM"
}

resource "aws_vpclattice_service_network_vpc_association" "app" {
  vpc_identifier             = aws_vpc.app.id
  service_network_identifier = aws_vpclattice_service_network.core.id
  security_group_ids         = [aws_security_group.lattice.id]
}

resource "aws_vpclattice_service" "payments" {
  name      = "payments"
  auth_type = "AWS_IAM"
}

Once associated, any workload in the app VPC can resolve the payments service through its Lattice-managed domain name, but the auth policy decides if the call succeeds.

The auth policy is where zero trust lives

This is the part that replaces "the network is open inside the VPC." The policy below allows only a specific role to invoke the payments service, and only over POST:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::111122223333:role/checkout-service"
    },
    "Action": "vpc-lattice-svcs:Invoke",
    "Resource": "*",
    "Condition": {
      "StringEquals": { "vpc-lattice-svcs:RequestMethod": "POST" }
    }
  }]
}

Anything not matching is denied by default. A compromised analytics pod in the same VPC cannot call payments, because it doesn't assume the checkout-service role. That is the difference between reachability and authorization made concrete.

Once identity is enforced at the network layer, "is this in my VPC?" stops being a meaningful security boundary. The principal making the call is the boundary.

Operational trade-offs I hit

It isn't free of friction:

  • SigV4 signing required. Callers must sign requests. For EKS, I run the AWS SigV4 proxy as a sidecar so app code stays unaware. That's an extra hop and a bit of latency (single-digit milliseconds in my tests).
  • Per-request and per-GB pricing. Lattice bills for processing; high-throughput chatty services add up, so I keep east-west bulk traffic that doesn't need identity off Lattice.
  • Observability. Enable access logs to CloudWatch or S3 early. The authStatus field telling you why a request was denied is invaluable during rollout.
  • Cross-account. Sharing the service network via AWS RAM works well, but get your resource policies straight before onboarding other teams.

I rolled it out service by service, starting with the most sensitive internal API and leaving the auth policy in a permissive log-only posture for a week to catch unexpected callers before flipping to enforce.

Takeaways

  • VPC Lattice decouples network reachability from authorization, letting IAM principals gate service-to-service calls.
  • The auth policy with vpc-lattice-svcs:Invoke is the real control plane for zero trust; default-deny everything else.
  • Use a SigV4 proxy sidecar so application code doesn't need to learn request signing.
  • Watch per-request pricing and turn on access logs from day one to debug denied calls during rollout.