Lambda cold starts: measuring what actually matters, The Cloud Ledger

I spent a week shaving 200ms off a Lambda's init time, felt great about it, and then discovered cold starts affected roughly 0.3% of our invocations and none of them were on the latency-critical path. I'd optimized a number that looked scary on a dashboard instead of the number our users felt. Cold starts are real, but most teams measure them wrong and then panic at the wrong percentile.

What a cold start actually is

A cold start is the one-time cost of standing up a new execution environment: downloading your code, starting the runtime, and running your initialization code (everything outside the handler). Once warm, that environment is reused for subsequent invocations with no init cost. So a cold start penalizes the first request to a fresh environment, not every request.

It breaks into two parts you control differently:

Platform init, runtime bootstrap, code download, VPC ENI attach. Largely AWS's job, though package size and VPC config affect it.
Function init, your top-level imports, SDK client creation, config loading. Entirely yours to optimize.

Measure the right percentile

The mistake I made was looking at average Duration. Cold starts are a tail phenomenon, they don't move the mean, they fatten p99. And the metric that matters isn't the cold start duration in isolation, it's how often a user-facing request hits one. The actual init time is in the REPORT line as Init Duration. Pull it from Logs Insights:

filter @type = "REPORT"
| stats
    count(*) as invocations,
    sum(ispresent(@initDuration)) as cold_starts,
    avg(@initDuration) as avg_init_ms,
    pct(@initDuration, 99) as p99_init_ms
  by bin(1h)

If cold_starts / invocations is tiny and those cold invocations aren't synchronous user requests, you're done. Don't optimize further.

The metric that matters is cold-start rate on the latency-critical path, not cold-start duration. A 1.5s cold start on a 0.05%-of-traffic async worker is not a problem.

The levers, ranked by effort vs. payoff

Lever	Typical effect	Cost
Shrink deployment package	Lower platform init	Free
Lazy-load heavy SDK clients	Lower function init	Free
More memory (= more CPU)	Faster init & exec	Higher per-ms price
Provisioned Concurrency	~Zero cold starts	Pay for idle capacity
SnapStart (Java)	Large init cut	Java-only constraints

Counterintuitively, raising memory often lowers total cost, because CPU scales with memory and a faster function bills fewer milliseconds. Always test this, don't assume more memory means a bigger bill.

The cheap code-level fix

Most function-init time hides in eager top-level work. Reuse clients across invocations by creating them once at module scope, but defer truly heavy or rarely-used imports into the handler:

import os
import boto3

# Created once per environment, reused while warm.
ddb = boto3.resource("dynamodb")
TABLE = ddb.Table(os.environ["TABLE_NAME"])

def handler(event, context):
    # Heavy, rarely-needed dependency: import only when this path runs.
    if event.get("export"):
        import pandas as pd  # not paid on the hot path
        return _export(pd, event)
    return _read(event)

The boto3 client lives at module scope so it survives across warm invocations; pandas only loads on the export path, so the common read path never pays for it.

When Provisioned Concurrency is worth it

If you have a synchronous, user-facing function with steady, predictable traffic and a strict latency SLA, Provisioned Concurrency keeps N environments warm so requests never cold-start. The trade-off is you pay for that capacity whether or not it's used. I reserve it for exactly those endpoints and let everything else cold-start. For spiky-but-predictable patterns, schedule the provisioned count with Application Auto Scaling rather than pinning it flat.

Takeaways

Cold starts hit only the first request to a fresh environment, they fatten p99, not the average.
Measure cold-start rate on the critical path via the @initDuration field, not raw duration.
Free wins first: smaller packages and lazy imports; right-size memory because more CPU can lower total cost.
Reach for Provisioned Concurrency only on synchronous, latency-sensitive endpoints with predictable traffic.