What’s Going On with Claude Code?

Why it's becoming dumber and what to do about this

Mar 23, 2026

Yan Gao | Founder of alphaguru.ai

If you’ve been using Claude Code in the past two weeks and felt like something was seriously wrong, you’re not imagining it.

The tool that users described as “collaborating with a Senior Expert” has been degrading into what feels like supervising a less competent worker who lies about their work. This isn’t a vague feeling. It’s documented across GitHub issues, Reddit threads, Hacker News, and Anthropic’s own status page.

Here’s everything I’ve found out.

The Symptoms

Users are reporting a cluster of failures that compound on each other:

Claiming work is done when it isn’t. Claude tells you edits are complete when it hasn’t actually done the work, or has done it partially and incorrectly. One GitHub user described Claude as having “started to lie about the changes it made to code.”

Ignoring explicit instructions. You say push to a feature branch; it pushes to main. You say modify font color only; it rebuilds the entire component. Instructions from earlier in the session simply evaporate.

Trial-and-error instead of reasoning. Instead of analyzing a problem once and delivering a working solution, Claude makes 20+ broken attempts - repeating commands after being told they fail, missing obvious environment constraints.

Sloppy, destructive edits. Bulk replacements that break unintended parts of the work. Editing files it hasn’t read. Overwriting content instead of modifying it surgically.

The Causes

There isn’t one root cause. There are at least four overlapping problems.

1. March 2026 Has Been an Infrastructure Disaster

Anthropic’s status page for March reads like a war log: major outages on March 2–3, elevated errors on March 11, Sonnet 4.6 spikes on March 12, clustered incidents on March 16–18, auth failures on March 19, hanging responses on March 20, and Opus/Sonnet errors again on March 21. That’s roughly one major incident every 2–3 days.

The failure patterns were inconsistent: sometimes hitting free users, sometimes API users, sometimes everyone. This points to model-level instability combined with imperfect isolation between serving pools. When you get a smart Claude on one message and a confused Claude on the next within the same session, your requests are likely being routed to different servers with different quality characteristics.

2. The Thinking Mode Got Quietly Nerfed

When Claude Code v2.0.x shipped, it made thinking mode “enabled by default” and deprecated explicit triggers — think, think hard, ultrathink. An Anthropic engineer confirmed on Twitter that these triggers are now cosmetic. Ultrathink still shows rainbow colors. It doesn’t increase thinking depth.

The result: instead of reading your plan, reasoning through the problem, then acting, Claude skips deep analysis and jumps straight to action. That’s why you see the trial-and-error loops. The model isn’t dumber: it’s being given less time to think.

3. Context Rot Is Real and Documented

Anthropic advertises a 1M-token context window. What they don’t prominently disclose is that quality degrades significantly as that window fills. One detailed GitHub issue laid out the pattern: reliable performance in the 0–20% context range, progressive degradation after that, and at 1M tokens, 1 in 4 retrievals fail. The effective reliable range is roughly 200–256K tokens.

This directly explains the “Claude ignores my instructions” problem. Your instructions from earlier in a long session aren’t being ignored - they’re being forgotten. The model can’t reliably retrieve them from deep in its context.

4. User Growth Is Stress-Testing Everything

Since late February, a “Quit ChatGPT” campaign drove a big increase in free users and doubled Anthropic’s paid subscriber base. More users means more load balancing changes, more routing decisions, and more opportunities for exactly the kind of infrastructure bugs that caused the August 2025 crisis.

This Isn’t New

The frustrating part is we’ve been here before. Repeatedly.

August–September 2025: Three infrastructure bugs degraded responses for weeks. About 30% of Claude Code users had at least one request misrouted. Anthropic published a detailed postmortem and promised better monitoring.

December 2025: Five documented incidents in a single month.

Late January 2026: Massive quality regression confirmed by Anthropic — a harness issue introduced on January 26, rolled back on January 28.

March 2026: Here we are again.

Each time, the same pattern: users report degradation, Anthropic stays silent for days or weeks, external pressure forces a response, and then a mea culpa attributing everything to infrastructure bugs while emphasizing they “never intentionally degrade model quality.”

What Anthropic Is Actually Working On

The Claude Code team shipped versions 2.1.70 through 2.1.83 in March — thirteen releases in three weeks. But look at what they’re building: voice mode with push-to-talk, /loop command, plugin marketplace, session naming, Claude for PowerPoint, Claude for Excel, --bare flag for scripted calls, --channels permission relay, self-serve Enterprise plans.

These are growth features and enterprise sales features.

They did fix some real bugs: a temperature override was being silently ignored on all streaming requests, and they improved system prompts to guide the model toward dedicated tools instead of bash equivalents. Those matter.

But what’s NOT in the changelog: no fix for context degradation, no restoration of thinking depth control, no infrastructure stability announcements, no transparency about routing or serving pool differences.

The one genuinely relevant product they shipped is Code Review (March 9), which uses multiple agents to catch logical errors in pull requests. It costs $15–25 per review. It’s essentially an admission that Claude Code’s own output can’t be trusted - so here’s another product to check its work.

The priorities are clearly tilted toward growth over reliability. From a business perspective, what they’re doing is “working.” The people complaining on GitHub are a vocal minority compared to the flood of new users.

The Paradox

Claude Code is still, by the numbers, the best AI coding tool and AI agentic system available.

And yet the community consensus is brutal: “Claude Code is higher quality but unusable. Codex is slightly lower quality but actually usable.”

The quality is there when it works. The problem is “when it works” is doing increasingly heavy lifting.

What You Can Do Right Now

Keep sessions short. Context degradation is real. Fresh session for each meaningful task.

One task per session, verified before moving on. “Read this file and tell me what you’d change” → verify → “Make change X only” → verify → “Commit and push to branch Y.”

Hard guardrails that don’t depend on Claude. Pre-push hooks rejecting main. Branch protection. These work because they operate outside the model’s context window.

Run claude update. Several harness bugs were patched recently.

Use /plan before complex tasks. Not a guarantee, but it forces a reasoning step before code gets written.

Use CLAUDE.md for project context. It loads at session start without consuming interactive tokens.

Consider the hybrid approach. Claude Code for architecture and complex reasoning. Codex or other tools for high-volume, lower-stakes work where reliability matters more than peak quality.

Bottom Line

The model behind Claude Code is genuinely best-in-class. The infrastructure between you and that model is unreliable, the thinking depth has been quietly reduced, context degrades over long sessions, and the team’s energy is going toward growth instead of fixing the foundation.

This is fixable. Anthropic has shown they can diagnose and resolve these issues when they prioritize it. The question is whether they’ll do it proactively, or wait for another wave of cancellations.

For now: use Claude Code like a brilliant but unreliable colleague. Verify everything. Keep tasks small. Build guardrails that work even when Claude doesn’t.

Pawel Jozefiak

Same pattern here. The 256K context degradation matches what I have been seeing for two weeks. Ultrathink stopped working reliably around March 15. Sessions that used to run clean now drift mid-task.

The conflicting CLAUDE.md sections explanation is worth taking seriously because I had that exact problem for a while before I caught it. One section said autonomous, another said confirm everything. Cleaned it up and agent got faster immediately. Curious whether the traffic spike is the real cause or just the convenient explanation for degraded performance across the board.

The AI-Empowered Investor

Discussion about this post

Ready for more?