"We need a queue" is one of the most overloaded sentences in distributed systems. Sometimes the person means a work queue, sometimes a pub/sub fan-out, sometimes an ordered event log they will replay later. On AWS these are three different services, and picking the wrong one means rebuilding the integration six months later. I have made that mistake; this post is the decision guide I wish I'd had.

The short version: SQS for work queues, SNS or EventBridge for fan-out, Kafka or Kinesis for an ordered, replayable log. The rest is matching your delivery and ordering needs to those tools.

The four contenders

ServiceModelOrderingReplayBest for
SQSqueue (point-to-point)FIFO optionnodecoupling work, buffering
SNSpub/sub fan-outFIFO optionnopush to many subscribers
EventBridgeevent bus + routingnoarchive/replaySaaS/event-driven integration
Kinesis / MSKordered log (streams)per-partition/shardyeshigh-throughput, replayable streams

SQS: when you just need to decouple work

If a producer hands off a task and exactly one worker should process it, you want SQS. It is the cheapest, simplest, most operationally boring choice, and boring is good. Standard queues are at-least-once with best-effort ordering; FIFO queues give strict ordering and exactly-once processing within a message group at lower throughput. Pair it with a dead-letter queue so poison messages do not loop forever.

resource "aws_sqs_queue" "dlq" {
  name = "jobs-dlq"
}

resource "aws_sqs_queue" "jobs" {
  name                       = "jobs"
  visibility_timeout_seconds = 60
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq.arn
    maxReceiveCount     = 5
  })
}

SNS vs. EventBridge: two flavors of fan-out

Both deliver one message to many destinations, but they solve different problems. SNS is a lean, high-throughput pub/sub pipe, publish to a topic, every subscriber gets it, very low latency. The common pattern is SNS-to-SQS fan-out: one publish, many durable queues, each consumer at its own pace.

EventBridge is a richer event router. It does content-based filtering with rule patterns, has dozens of native SaaS and AWS integrations, supports schema discovery, and can archive and replay events. It costs more per event and adds a little latency, but for "route this event to the right handlers based on its contents" it eliminates a lot of glue code.

Use SNS when you know the subscribers and want speed. Use EventBridge when you want to route by event content and integrate across many systems without writing the plumbing.

Streams: when order and replay matter

SQS and SNS forget a message once it is consumed. When you need an ordered, durable log that multiple independent consumers can read at their own offset, and replay from the past, you want a stream. On AWS that is Kinesis Data Streams or Amazon MSK (managed Kafka).

  • Kinesis is serverless-ish, billed per shard or on-demand, and integrates tightly with Lambda and Firehose. Great when you want a managed stream without operating Kafka.
  • MSK gives you real Kafka, the ecosystem, Kafka Connect, exactly-once semantics, and portability off AWS, at the cost of more to tune. Choose it when you already speak Kafka or need its connectors.

Ordering in both is per-partition (Kafka) or per-shard (Kinesis), so design your partition key so that messages that must stay ordered share a key.

A quick decision path

  1. One consumer doing work? SQS (FIFO if order matters).
  2. Many consumers, you know who they are, want low latency? SNS (often SNS-to-SQS).
  3. Route events by content across many systems? EventBridge.
  4. Need ordering and replay at high throughput? Kinesis, or MSK if you need real Kafka.

Takeaways

  • SQS is the default for decoupling work; add a DLQ and use FIFO only when you truly need ordering.
  • SNS is fast pub/sub to known subscribers; EventBridge adds content-based routing, integrations, and replay.
  • Reach for Kinesis or MSK only when you need an ordered, replayable log, and design partition keys for ordering.
  • Match the tool to delivery, ordering, and replay needs up front; switching buses later is expensive.