Demajh, Inc.

AI Assistants for Data Science: Lessons From the METR Developer Study

A recent randomized-controlled trial by METR found that experienced engineers took 19 % longer to land pull-requests when AI tools were allowed—even though they expected a 24 % speed-up. The disconnect between perceived and actual productivity is a wake-up call for every team betting on full AI automation.

Code Bases vs. Datasets

On sprawling repositories, AI stumbles because it lacks the tacit context developers build over years. In data science the analogue is proprietary data. Sensitive tables, medical images, or clickstreams can’t be shipped to a commercial LLM without violating policy or contract. Just as coders must develop intuition for large codebases themselves, data scientists must develop an feel for structure, distribution shifts, and hidden artifacts by eyeballing rows and plotting slices—work no external model can safely absorb.

A Place for AI in Data Science

Unlike software engineers who hack on large codebases, analysts often juggle dozens of one-off notebooks and utility scripts. Here, AI excels at drafting boilerplate: feature extractors, tidy-data pipelines, or quick EDA visuals. But the research question—why this slice, why that metric—remains a human craft. Use assistants for pandas incantations; guard the hypothesis generation and study design.

Model Architecture Iteration Is Also Still Manual

For production models, training data is almost always quarantined on-prem or in a VPC. An LLM can riff on abstract summaries of error plots—“precision drops on Southeast users”—and suggest a wider receptive field or a focal-loss tweak. But reviewing failure cases, aligning with business constraints, and validating improvements have to be done by a human staring straight at the mis-predictions. Think of the assistant as a fast literature-reviewer, not an end-to-end AutoML system.

A Practical Playbook

5. Where We Go From Here

The METR study reminds us that feels faster is not the same as is faster. For data teams, the lesson is clear: treat LLMs like enthusiastic interns—great at spitting out first drafts, but not trusted to go deeper. Pair them with rigorous review loops, clear data-governance walls, and telemetry that measures real cycle time. With that discipline, AI can amplify human judgment instead of papering over it.

← Back to all posts