An opinion piece examines whether AI coding tools like Claude Code and Copilot truly enhance developer skills or merely accelerate flawed decision-making, highlighting the need for new metrics to evaluate human-AI collaboration in engineering.
I've been thinking about this a lot lately, maybe way too much. AI coding tools are getting powerful **really** fast. Claude Code, Codex, Cursor, Copilot, all of this, and now Anthropic just launched Opus 4.8 with Dynamic Workflows, where the direction is not just *"autocomplete my function"* anymore. It is more like codebase-level migrations, subagents, big workflows, from kickoff to merge. At the same time, Microsoft is reportedly pushing engineers away from Claude Code toward Copilot after budget issues, and Fortune reported **Uber burned through its 2026 AI coding tools budget in four months**. Uber's COO also basically questioned whether more AI tool usage is clearly turning into more useful product output. That part really stuck with me. Because maybe the issue is not only *"are the tools good enough?"* Maybe the harder question is: **are we good enough at using them?** Imagine having a junior dev who writes code insanely fast, never gets tired, and always sounds confident. But if your instructions are fuzzy, or you can't properly review what they built, they ship stuff that **looks good but breaks in production**. That's AI right now. It doesn't replace your judgment. It multiplies whatever judgment you bring — and then it *feels* productive. I keep coming back to this line from a senior dev I respect: >**AI coding is a mirror, not a ladder.** It reflects your thinking. Your context. Your taste. Your ability to review. Your ability to say *"no, this is wrong"* when the output looks clean but the design is bad. Prompting alone is not intelligence. Vibe coding is not engineering. And maybe reviewing AI output is becoming as important as writing the first version manually. Maybe even more important in some cases. **Unlimited tokens can honestly make you lazy** if you stop thinking and just keep asking the model to fix whatever the previous model broke. And I'm not saying this like I'm above it. I do this too. Sometimes when tokens are limited, I suddenly become smarter because I'm forced to explain better. When tokens feel unlimited, I can get sloppy. # We had proof systems before. We don't have one for this. Before the AI era, we had some rough but functional proof systems: * **GitHub** showed you could build real things * **LeetCode** showed DSA ability * **Kaggle** showed ML chops * **Open source contributions** showed consistency and collaboration * **Designers have portfolios** But what will prove **AI-native engineering judgment**? Not this: * *"I used 7 agents on my laptop"* screenshots on Twitter * A random 24-hour hackathon demo * Just shipping a lot of code * *"I built an entire app in a weekend using AI"* If hiring shifts toward *"show me how you work with AI,"* then **what do we actually show?** A PR? Prompt logs? A decision memo? A walkthrough of where the AI was wrong and how you corrected it? Tests proving the output was validated properly? Honestly, I don't know. And it feels like **nobody really knows how to measure good human + AI collaboration yet.** That feels weird, because the tools are improving faster than most people are adapting. It may be that a very small group of good engineers are becoming **AI orchestrators** not just people who write code manually, but people who *guide systems*, give context, review output, catch hallucinations, and make judgment calls. That's a very different skill than just typing fast. And I'm not sure we're teaching it, measuring it, or honestly even talking about it enough. # I'm actually curious what you think Are you learning how to work with AI *properly*, or mostly just using it when stuck? And if tomorrow an interviewer asked you: *"Show me proof that you can use AI well as an engineer"* what would you even show? **Are we actually ready for this?** *PS: Please comment whatever thoughts you have on this. It would honestly help me stop overthinking about it and thank you for reading :)*
This article critiques common flawed methods for evaluating AI-assisted coding tools, such as counting lines of code, timing artificial tasks, and relying on developer self-reports, arguing for more rigorous research methods.
Developers increasingly refuse to work without AI coding tools, but studies and reports suggest this reliance may not boost productivity and could increase maintenance costs, raising concerns about long-term impact.
The author argues that heavily relying on AI coding agents causes human developers to lose critical technical intuition and code review skills over time, proposing measures like mandatory hands-on coding days to maintain supervisory competence.
Anthropic's Code with Claude event in London showcased how AI coding tools like Claude Code are automating software development, with developers increasingly handing off code writing and debugging to AI. New features like 'dreaming' allow Claude agents to share notes and learn from past errors.