Lambda cold starts: measuring what actually matters
Cold starts are real but often misunderstood. How to measure and when to care.
I spent a week shaving 200ms off a Lambda's init time, felt great about it, and then discovered cold starts affected roughly 0.3% of our invocations and none of them were on the latency-critical path. I'd optimized a number that looked scary on a dashboard instead of the number our users felt. Cold starts are real, but most teams measure them wrong and then panic at the wrong percentile.
What a cold start actually is
A cold start is the one-time cost of standing up a new execution environment: downloading your code, starting the runtime, and running your initialization code (everything outside the handler). Once warm, that environment is reused for subsequent invocations with no init cost. So a cold start penalizes the first request to a fresh environment, not every request.
It breaks into two parts you control differently:
- Platform init, runtime bootstrap, code download, VPC ENI attach. Largely AWS's job, though package size and VPC config affect it.
- Function init, your top-level imports, SDK client creation, config loading. Entirely yours to optimize.
Measure the right percentile
The mistake I made was looking at average Duration. Cold starts are a tail phenomenon, they don't move the mean, they fatten p99. And the metric that matters isn't the cold start duration in isolation, it's how often a user-facing request hits one. The actual init time is in the REPORT line as Init Duration. Pull it from Logs Insights:
filter @type = "REPORT"
| stats
count(*) as invocations,
sum(ispresent(@initDuration)) as cold_starts,
avg(@initDuration) as avg_init_ms,
pct(@initDuration, 99) as p99_init_ms
by bin(1h)
If cold_starts / invocations is tiny and those cold invocations aren't synchronous user requests, you're done. Don't optimize further.
The metric that matters is cold-start rate on the latency-critical path, not cold-start duration. A 1.5s cold start on a 0.05%-of-traffic async worker is not a problem.
The levers, ranked by effort vs. payoff
| Lever | Typical effect | Cost |
|---|---|---|
| Shrink deployment package | Lower platform init | Free |
| Lazy-load heavy SDK clients | Lower function init | Free |
| More memory (= more CPU) | Faster init & exec | Higher per-ms price |
| Provisioned Concurrency | ~Zero cold starts | Pay for idle capacity |
| SnapStart (Java) | Large init cut | Java-only constraints |
Counterintuitively, raising memory often lowers total cost, because CPU scales with memory and a faster function bills fewer milliseconds. Always test this, don't assume more memory means a bigger bill.
The cheap code-level fix
Most function-init time hides in eager top-level work. Reuse clients across invocations by creating them once at module scope, but defer truly heavy or rarely-used imports into the handler:
import os
import boto3
# Created once per environment, reused while warm.
ddb = boto3.resource("dynamodb")
TABLE = ddb.Table(os.environ["TABLE_NAME"])
def handler(event, context):
# Heavy, rarely-needed dependency: import only when this path runs.
if event.get("export"):
import pandas as pd # not paid on the hot path
return _export(pd, event)
return _read(event)
The boto3 client lives at module scope so it survives across warm invocations; pandas only loads on the export path, so the common read path never pays for it.
When Provisioned Concurrency is worth it
If you have a synchronous, user-facing function with steady, predictable traffic and a strict latency SLA, Provisioned Concurrency keeps N environments warm so requests never cold-start. The trade-off is you pay for that capacity whether or not it's used. I reserve it for exactly those endpoints and let everything else cold-start. For spiky-but-predictable patterns, schedule the provisioned count with Application Auto Scaling rather than pinning it flat.
Takeaways
- Cold starts hit only the first request to a fresh environment, they fatten p99, not the average.
- Measure cold-start rate on the critical path via the
@initDurationfield, not raw duration. - Free wins first: smaller packages and lazy imports; right-size memory because more CPU can lower total cost.
- Reach for Provisioned Concurrency only on synchronous, latency-sensitive endpoints with predictable traffic.