Top-down workspace image for machine learning systems essay
March 1, 2026 Essay

Machine Learning Starts Before the Model

In production ML systems, the hardest part is often not the algorithm. It is making data, labels, and evaluation trustworthy enough for the model to matter.

  • Machine Learning
  • Data Engineering
  • Systems Thinking

When people talk about machine learning, the conversation usually starts with models. In real systems, the work usually starts much earlier.

The difficult part is usually not selecting an estimator or tuning a parameter. It is understanding the data contract, the meaning of the labels, the quality of the pipeline, and whether the evaluation setup reflects the decision the system is actually supposed to support.

Why this matters

If those foundations are weak, model performance becomes misleading. A technically sophisticated system can still be operationally fragile because the data contracts are unstable, the extraction logic is inconsistent, or the evaluation setup does not reflect reality.

That is why I think machine learning should be understood as a systems discipline. Pipelines, validation design, observability, and interpretation are not support work around the model. They are part of the model’s credibility.

A more useful mindset

The better question is often not “Which model should we use?” but “What needs to be true for any model here to be trustworthy?”

That shift leads to better technical decisions, better collaboration, and fewer systems that look impressive in development but fail under real conditions.