Past and Present of Data Science

🌟 I became a data scientist in 2017, and the field has changed dramatically since.

I’m grateful that many people have read my 2019 post, Types of Data Scientists. Interviewers have brought it up, and students and job seekers often tell me it helped them. This post is a continuation—I hope readers continue to find this helpful.

🔙 Looking back: from statistics to “data science”

Before the rise of “big data,” most people doing what we now call data science were statisticians or analysts. Data was growing but still relatively modest. The focus was on:

• Understanding relationships in data
• Confirming statistical properties (e.g., hypothesis testing)
• Statistical modeling and prediction

🌊 The big‑data wave (late 2000s–early 2010s)

A surge in cheap storage and distributed computing (MapReduce/Hadoop) plus the explosion of web and then mobile apps created unprecedented data volumes. Cloud platforms (e.g., AWS) made this scale accessible. Organizations raced to turn data exhaust into insight and advantage.

🧑‍🔬 The modeling boom and academic roots

Many machine learning algorithms predate the 2000s—neural nets and backpropagation go back decades—but limited compute held them back. By the early 2010s, GPUs and cloud made training at scale practical. Models were complex and required careful hyperparameter tuning, so companies hired highly technical practitioners—often PhDs—to push modeling performance and publishable R&D.

🚀 From prototypes to production: the rise of MLOps (late‑2010s)

Success created a new problem: research‑grade code was being pushed to production without engineering or operational rigor. Practices for model deployment, monitoring, and lifecycle management emerged in the mid‑2010s; the “MLOps” term and dedicated tooling gained traction around 2018–2019. As tooling matured and models were packaged behind higher‑level APIs, the center of gravity shifted toward software engineering and operational skills.

This arc—from statistics to big data to MLOps—set the stage for today’s landscape.
Next, I’ll discuss where data science is now and where it’s headed.

🧩 From generalist to specialist (late‑2010s → present)

As tooling matured and production expectations rose, roles split. There are fewer true generalists who can “do it all,” and many more specialists who go deep on a slice of the stack:

Product analytics: product‑sense data analysts who define metrics, instrument events, run experiments, build dashboards, and partner closely with PM/design to influence decisions.
Machine learning engineers: software engineers with ML fundamentals who focus on shipping and operating models—deployment, observability, evaluation, reliability, and MLOps/LLMOps.
Data/analytics engineers: owners of trustworthy, timely data—ETL/ELT pipelines, data models in the warehouse, semantic layers, data quality and lineage.
ML platform/infrastructure: builders of shared foundations—feature/embedding stores, training/serving/orchestration, retrieval, evaluation harnesses, cost/latency controls.
Applied ML scientists and researchers: practitioners who push modeling performance, design experiments, and advance algorithmic approaches where it matters.

Specialization raises the bar for quality and speed but reduces the number of one‑person “unicorn” roles.
The practical advice: Pick a lane you enjoy, then go deep while staying T‑shaped: strong fundamentals in statistics, experimentation, software practices, and clear communication so you can collaborate across lanes.

🔙 Looking back: from statistics to “data science”

🌊 The big‑data wave (late 2000s–early 2010s)

🧑‍🔬 The modeling boom and academic roots

🚀 From prototypes to production: the rise of MLOps (late‑2010s)

🧩 From generalist to specialist (late‑2010s → present)

Enjoy Reading This Article?