A vendor demo convinced my team we needed a feature store. Six weeks later we had a beautifully provisioned SageMaker Feature Store serving exactly one model that didn't have a training/serving skew problem in the first place. We'd bought a fire extinguisher because the brochure was shiny, not because anything was on fire.

Feature stores solve real problems. They're just not your problems until your team hits a specific shape of pain. Here's how I now decide.

What a feature store actually buys you

Strip away the marketing and a feature store does three concrete things:

  • Kills training/serving skew by computing a feature once and serving the identical value to both the training pipeline (offline) and the live model (online).
  • Enables reuse, a "30-day rolling spend" feature computed by one team is available to every other model without recomputation.
  • Provides point-in-time correctness for offline training, so you don't accidentally leak future data into historical examples.

If none of those three sentences describe a pain you currently feel, you don't need one yet.

The honest decision tree

Ask yourself these in order. The first "no" is your answer.

  1. Do you serve real-time, low-latency predictions where features must be fetched in milliseconds? If you only batch-score nightly, a Parquet table in S3 + Athena is your feature store.
  2. Do multiple teams or models share features? If it's one model owned by one team, the reuse value is zero.
  3. Are you actually seeing skew bugs, production accuracy diverging from offline eval for no modeling reason? If not, you may be solving a hypothetical.
A feature store is infrastructure for an organization, not a model. If you have one model and one notebook, you have a feature pipeline, not a feature store problem.

The AWS options, ranked by how much rope they give you

OptionOnline latencyBest when
S3 + Athena (DIY offline only)n/a (batch)Batch scoring, no real-time need
DynamoDB as online store + S3 offline~single-digit msYou want control and low cost
SageMaker Feature Store~ms (online store)Multiple teams, want managed offline+online sync

SageMaker Feature Store gives you an online store (low-latency reads) and an offline store (S3, Parquet, Glue-cataloged) that stay in sync automatically. That sync is the thing you'd otherwise build yourself with DynamoDB streams and a backfill job, which is exactly the DIY middle row.

What ingestion looks like

Writing a feature record to both stores is a single API call once the feature group exists:

import boto3, time

fs = boto3.client("sagemaker-featurestore-runtime")

fs.put_record(
    FeatureGroupName="customer-features-v1",
    Record=[
        {"FeatureName": "customer_id",      "ValueAsString": "a1b2"},
        {"FeatureName": "spend_30d",        "ValueAsString": "842.10"},
        {"FeatureName": "orders_30d",       "ValueAsString": "7"},
        {"FeatureName": "event_time",       "ValueAsString": str(time.time())},
    ],
)

# Online read at inference time
rec = fs.get_record(
    FeatureGroupName="customer-features-v1",
    RecordIdentifierValueAsString="a1b2",
)

The event_time field is non-negotiable, it's what makes point-in-time joins correct when you later build a training set from the offline store.

The costs nobody demos

The online store is effectively a managed DynamoDB table, so you pay for storage and read/write throughput continuously, even for features queried rarely. The offline store accumulates in S3 and grows forever unless you set lifecycle rules. And the operational surface, feature group schemas, versioning, backfills, is real engineering time. For a small team, the DynamoDB-as-online-store DIY route is often 60-70% cheaper and you understand every moving part. I only graduate to the managed product when a second team starts asking for someone else's features.

Takeaways

  • A feature store solves skew, reuse, and point-in-time correctness, if you don't feel those three pains, skip it.
  • Batch-only scoring needs nothing more than S3 + Athena; reach for an online store only for real-time inference.
  • DIY with DynamoDB (online) + S3 (offline) is cheaper and more transparent for one team; SageMaker Feature Store earns its keep across multiple teams.
  • Always stamp an event_time so offline training sets stay point-in-time correct, and set S3 lifecycle rules before the offline store balloons.