Ingestion, retrieval, generation, and evaluation, a complete reference architecture.
The archive
All posts
2026
Canaries, gradual rollouts, and instant rollback, decoupling deploy from release.
A serverless, distributed SQL database, what it is, and where it fits today.
A back-of-envelope model for what idle headroom really costs you per year.
Queue-backed inference for large payloads and bursty traffic, scaling to zero between bursts.
Wiring CloudWatch, X-Ray, and OpenTelemetry into one coherent view.
Service-to-service auth and access policies without managing a mesh.
Schedule-driven cleanup of dev environments and orphaned resources, on autopilot.
SQS, SNS, EventBridge, Kinesis, MSK, an updated map of which fits which job.
Multi-step agents that call tools, what works, what fails, and how to keep costs sane.
The Framework is huge. Here’s the subset that earns its keep when you’re small.
An updated, prioritized list of cost levers, highest impact first.
2025
Everything between “it works in the notebook” and a model serving real traffic.
Serverless SQL over your data lake, partitioning and formats that keep it fast and cheap.
AWS re:Invent 2025: my recap and the launches that matter
Field notes from Las Vegas: the keynotes, the standout announcements, and what I am taking back to production.
Long-running, fault-tolerant processes without managing servers or your own queue.
Cross-AZ, cross-region, and egress charges, the silent line items, mapped and tamed.
Routing easy requests to small models and hard ones to large, quality at a fraction of the cost.
Cache-aside, write-through, and the failure modes that bite under load.
Cutting through the findings firehose to the alerts that matter.
A 30-minute monthly ritual that catches waste before it compounds.
Tool use, action groups, and where managed agents fit versus rolling your own.
Building an evaluation harness so you can ship prompt and model changes with confidence.
At-least-once delivery means duplicates. Patterns to make handlers safe to retry.
Automatic tiering sounds free. Here’s the math on when it actually pays off.
From raw data in S3 to model-ready features, orchestrated and repeatable.
Federated, short-lived credentials for CI, delete those access keys for good.
Managed Kafka or Kinesis Data Streams? Throughput, ops burden, and cost compared.
How much to commit, for how long, and how to avoid over-committing into a corner.
Autoscaling, multi-model endpoints, and serverless inference, paying for what you use.
Active-active, active-passive, or just backups? Matching resilience to actual requirements.
Automatic rotation for database credentials and API keys, with zero downtime.
Bringing cost into the engineering loop without turning everyone into an accountant.
A decision framework for container compute on AWS that doesn’t cargo-cult big-company stacks.
Per-token pricing vs GPU instances, where the crossover point actually sits.
Golden paths, self-service, and the AWS building blocks that make a platform team possible.
A structured walk through every major lever, from compute commitments to storage tiering.
2024
Data drift, model drift, and the metrics that tell you a model has quietly gone stale.
A decision framework for picking a database on AWS, by access pattern, not hype.
Two front doors for Lambda. Cost, features, and latency compared.
Turn last month’s surprise into next month’s forecast, with alerts before you blow the budget.
Ingest documents, chunk, embed, and query, a working RAG setup with managed pieces.
Point-to-point integrations with filtering and enrichment, minus the Lambda plumbing.
Block Public Access, policies, encryption, and the misconfigurations that cause breaches.
ARM-based instances are cheaper and faster for most workloads. Migrating is easier than you think.
When fine-tuning beats prompting, and how the Bedrock workflow actually looks.
Coordinate retries, timeouts, and human approval without writing your own state machine.
Cache keys, TTLs, and invalidation, squeezing hit rates out of not-quite-static content.
Without consistent tags, your cost reports are fiction. A tagging policy that sticks.
Storing and querying embeddings for semantic search and RAG, without a new database.
Remote state, locking, and the layout that prevents “who applied what” incidents.
Autoscaling Postgres/MySQL that scales to fractional capacity, and when it’s the wrong call.
Up to 90% off, if your workload tolerates interruption. Patterns that make it safe.
Package a model in a container, serve it from Lambda, pay only when it runs.
Subnets, routing, and endpoints, a pragmatic VPC layout you won’t outgrow next quarter.
Cold starts are real but often misunderstood. How to measure and when to care.
Log ingestion and custom metrics add up fast. A checklist to trim the bill.
SageMaker Feature Store solves a real problem, but only past a certain scale.
Allow, deny, boundaries, SCPs, the order AWS evaluates them, and why your policy isn’t working.
A gentler path into single-table modeling, with access patterns front and center.
NAT Gateways quietly bleed money on data processing. Here’s how to find and fix it.
2023
When you don’t need a real-time endpoint, batch transform is cheaper and simpler.
Blue/green deployments on ECS that roll back automatically when health checks fail.
Account structure, SCPs, and guardrails you’ll wish you had from day one.
How commitment-based discounts actually work, and a buying strategy that hedges risk.
Three messaging services, three jobs. A decision guide with real examples.
A first SageMaker project that won’t surprise you with a four-figure bill.
Standard, IA, One Zone, Glacier, a plain-English map of when each one saves you money.
A repeatable process for finding over-provisioned instances and acting on it without breaking prod.