How I Used AI Agents to Upgrade LangGraph 1.0 (And What Actually Happened)

Last week, I decided to upgrade our production application to LangGraph 1.0. What started as a simple npm install turned into a fascinating experiment in human-AI collaboration—and a reminder that even AI agents need adult supervision.

Let me show you what actually happened, warts and all.

The Setup: Not Your Typical Upgrade

I’ve been using Claude Code (Anthropic’s official CLI) for development work, and I wanted to see how well AI agents could handle a real-world dependency upgrade. Not just the code changes, but the entire process: testing, QA, documentation, and deployment.

My stack:

Always Cool AI platform (Next.js 15, TypeScript, Prisma)
5 LangGraph implementations running in production
32 compliance tests that needed to keep passing
Requesty router integration (custom OpenAI router for 150+ LLMs)

The upgrade path looked straightforward on paper:

# Simple, right?
npm install @langchain/[email protected]

Spoiler: It wasn’t.

Step 1: Discovering We Were Already Half-Upgraded

I started by asking Claude Code: “I’m thinking about upgrading my LangGraph agents to 1.0. What do we have to change?”

The first thing it did was actually check my code. Not just give generic advice—it literally ran Grep to find all my LangGraph implementations and analyzed the patterns I was using.

# What it found
grep -r "Annotation.Root" src/libs/

The agent came back with good news: “Your code is already 100% compatible with LangGraph 1.0 patterns!”

Turns out, I’d been using the modern Annotation.Root pattern all along without realizing it was the 1.0 way. All 5 of my graphs were ready to go:

crvComplianceGraphV2.ts
kosherCatcherGraph.ts
fdaComplianceGraph.ts
crvComplianceGraph.ts
complianceOrchestrator.ts

Lesson 1: The agent didn’t just tell me to upgrade—it verified my actual codebase first. This saved hours of potential refactoring.

Step 2: The Package Upgrade (AKA Dependency Hell)

Installing LangGraph 1.0 created an immediate cascade:

npm install @langchain/[email protected]
# Error: peer dependency conflict with @langchain/core

Here’s where Claude Code got interesting. Instead of just Googling, it:

Checked peer dependencies for every package
Found the compatibility matrix (LangGraph 1.0 needs Core 1.0, OpenAI 1.0, Zod 4.x)
Made a plan and created a todo list

I watched it update package.json in stages:

First: @langchain/langgraph and @langchain/core
Hit error → Updated @langchain/openai
Hit error → Updated zod to v4
Hit error → Updated openai SDK to v6

Each error taught the agent something. By the end, we had:

{
  "@langchain/core": "^1.0.1",
  "@langchain/langgraph": "1.0.0",
  "@langchain/openai": "^1.0.0",
  "zod": "^4.1.12",
  "openai": "^6.6.0"
}

Lesson 2: The agent worked iteratively. It didn’t pretend to know everything up front—it debugged in real-time, just like I would.

Step 3: The Zod v4 Migration (21 Files Changed)

This is where things got painful. Zod v4 renamed .errors to .issues across their entire API.

The agent found the issue during build:

// ❌ Broke everywhere
validation.error.errors[0].message

// ✅ Zod v4 way
validation.error.issues[0].message

It found 21 files with this pattern. Rather than manually fixing each one, the agent wrote and ran a bulk fix:

find src -name "*.tsx" -o -name "*.ts" | \
  xargs sed -i '' 's/\.error\.errors\[/\.error\.issues\[/g'

But here’s the interesting part: it missed one.

The build failed again with a different error in route.ts. The agent had to search again, find the missed instance, and fix it manually.

Lesson 3: Even AI agents make mistakes. The key is they can recover quickly and learn from each error.

Step 4: When Two AI Agents Disagreed

This is where it got wild. I asked the agent: “What subagents can we use to test this code?”

It spun up two specialized testing agents in parallel:

Agent 1: EvidenceQA (The Skeptic)

This agent’s personality is “screenshot-obsessed, fantasy-allergic.” Its job: prove everything with evidence.

It ran actual commands:

npm list @langchain/langgraph  # Verify version
npm test                        # Run 32 tests
npm run build                   # Production build
node -e "test zod v4 runtime"   # Verify .issues works

EvidenceQA’s verdict: ✅ PASS - Production ready. Found 4 minor issues (legacy scripts, warnings).

Agent 2: code-reviewer (The Analyzer)

This agent analyzes code quality, security, and best practices.

It reviewed:

All 27 changed files
Dependency safety
Zod migration completeness
Type safety

code-reviewer’s verdict: ❌ CRITICAL ISSUES - Revert to Zod v3.

The Problem: They Contradicted Each Other

EvidenceQA said “Zod v4 works perfectly, I tested it.”

code-reviewer said “OpenAI doesn’t support Zod v4, you must revert.”

So I made them prove it. I had the agent check the actual peer dependencies:

npm view [email protected] peerDependencies  # Old: requires zod ^3.23.8
npm view [email protected] peerDependencies   # New: supports zod ^3.25 || ^4.0

The truth: We had TWO versions of OpenAI installed. The old one (4.86.1) didn’t support Zod v4. The new one (6.6.0) did.

EvidenceQA was right. The runtime tests proved it. The upgrade worked because we were actually using the new OpenAI version.

Lesson 4: When AI agents disagree, make them show their work. The one with evidence wins.

Step 5: The Requesty Router Mystery (403 Errors)

When testing the upgraded app locally, I started getting 403 errors from Requesty (my custom OpenAI router). The agent had to debug why the router integration stopped working.

The issue: OpenAI SDK v6 changed how custom fetch wrappers work.

Before (OpenAI v4):

configuration: {
  fetch: async (url: string, init: any) => {
    return fetch(url, init);
  }
}

After (OpenAI v6):

configuration: {
  fetch: async (url: RequestInfo | URL, init?: RequestInit) => {
    const headers = new Headers(init?.headers);
    return fetch(url, { ...init, headers });
  }
}

The agent also caught that @langchain/openai v1.0 changed the config parameter:

Old: openAIApiKey
New: apiKey

Lesson 5: Custom integrations always break during major upgrades. The agent caught this before it became a production incident.

Step 6: Quality Cleanup (The Optional Items)

After the core upgrade worked, I told the agent: “Let’s address the optional items.”

It went through and cleaned up:

Deleted test-kosher-catcher.js - Legacy CommonJS test file that couldn’t run anymore
Fixed .npmrc - Commented out deprecated public-hoist-pattern config
Fixed React Hook warning - Wrapped handleFiles in useCallback with proper deps
Created integration test - Added test-langgraph-integration.js for graph streaming

Each fix was methodical:

Read the file
Understand the issue
Apply the fix
Verify with rebuild
Move to next item

Lesson 6: AI agents are great at systematic cleanup work. They don’t get bored or skip steps.

The Tools That Made This Work

Let me break down the actual tools Claude Code used:

Core Tools:

Read: Read any file in the codebase (used 20+ times)
Edit: Make surgical changes to files (used 15+ times)
Bash: Run commands, tests, builds (used 40+ times)
Grep: Search code for patterns (used 10+ times)
Glob: Find files by pattern (used 5+ times)

Meta Tools:

Task: Launch specialized subagents (used 2 times)
TodoWrite: Track progress across complex tasks (used constantly)
Write: Create new files (blog posts, tests)

The Workflow:

Search first - Grep/Glob to understand current state
Read actual code - Don’t assume, verify
Make changes incrementally - Edit one thing at a time
Test after each change - Bash to run tests/builds
Track everything - TodoWrite for complex multi-step work

What I Learned About AI Agent Collaboration

1. Agents Work Best With Tight Feedback Loops

Every command the agent ran gave it information to decide the next step. When something failed, it immediately saw the error and adjusted.

This isn’t possible with traditional AI chat. The agent needed to:

Run a command
See the error
Try a fix
Run again

All without waiting for me.

2. Multiple Agents > Single Agent

Having two QA agents disagree was actually valuable. It forced verification with evidence rather than assumptions.

The pattern:

EvidenceQA: Tests runtime behavior, provides proof
code-reviewer: Analyzes code structure, finds issues
Me: Final arbitrator when they disagree

3. Agents Are Tools, Not Wizards

The agents made mistakes:

Missed one Zod error
One agent gave incorrect advice (code-reviewer)
Needed guidance on which approach to take

But their mistakes were recoverable because:

They could test their own fixes
They showed their work
They didn’t pretend to be certain

4. Todo Lists Keep Everything Sane

For a 3-hour upgrade touching 28 files, tracking state was critical. The agent constantly updated its todo list:

✅ Update package.json
✅ Install dependencies
✅ Fix Zod v4 breaking changes
⏳ Rebuild and verify
⏸️ Create git commit

This kept both of us aligned on progress.

The Results: Was It Worth It?

Time invested: 3 hours (with extensive testing and documentation)

Lines changed:

28 files modified
1,378 lines added
128 lines removed

Test results:

✅ 32/32 compliance tests passing (100%)
✅ Zero TypeScript errors
✅ Clean production build
✅ No deprecation warnings

Production status: Deployed successfully, no rollbacks needed.

Documentation created:

477-line technical upgrade guide
This blog post
Integration test suite

Would I do this again? Absolutely.

But with caveats:

What Worked:

✅ Dependency resolution and package upgrades
✅ Bulk code changes (Zod migration)
✅ Testing and validation
✅ Systematic cleanup work
✅ Documentation generation

What Needed Human Oversight:

⚠️ Deciding between conflicting agent advice
⚠️ Verifying custom integration fixes (Requesty)
⚠️ Understanding business impact of changes
⚠️ Final deployment decisions

Key Takeaways for Using AI Agents

If you’re considering using AI agents for complex dev work:

1. Start with well-tested code Having 32 passing tests meant every change could be validated immediately.

2. Let agents iterate Don’t expect them to get it right first try. Let them debug and learn.

3. Use multiple agents Different agents have different strengths. Use them to cross-check each other.

4. Verify with evidence When an agent makes a claim, make it prove it with actual test output.

5. Keep tight feedback loops Agents work best when they can see results immediately and adjust.

6. Stay engaged This wasn’t “ask AI and walk away.” I was involved the whole time, guiding decisions.

The Future of AI-Assisted Development

This upgrade showed me that we’re hitting a sweet spot with AI agents:

Not powerful enough to:

Make architectural decisions alone
Understand business context
Handle truly novel problems

But powerful enough to:

Execute multi-step technical tasks
Debug and self-correct
Handle systematic refactoring
Generate comprehensive documentation
Validate their own work

The key insight: AI agents are incredibly productive junior engineers. They can execute complex plans, but they need senior oversight.

What’s Next?

I’m now using this agent-assisted workflow for:

Database migrations
API endpoint creation
Component development
Infrastructure updates

Each agent specializes in a different domain, and they all have access to the same base tools (Read, Edit, Bash, etc.).

The pattern is consistent:

I provide the requirements
Agent creates a plan and todo list
Agent executes with my guidance
Multiple agents cross-validate
I make final decisions

Try It Yourself

If you want to upgrade to LangGraph 1.0 using this approach:

Start with a well-tested codebase
Use Claude Code or similar agentic tools
Let agents validate your current state first
Allow them to handle bulk refactoring
Use multiple agents for cross-validation
Verify everything with actual test runs

The most important lesson: AI agents work best as collaborators, not replacements.

Final Thoughts

The LangGraph 1.0 upgrade was the perfect test case for AI agents. Complex enough to be interesting, risky enough to require validation, but structured enough that agents could make progress.

The most surprising part? The agents disagreed with each other. And that disagreement made the final solution better because it forced evidence-based verification.

Would I trust an AI agent to do this completely autonomously? No.

Would I want to do this upgrade without AI agents? Also no.

The sweet spot is collaboration: AI agents handle the systematic work while humans make the judgment calls.

That’s the future of development. Not AI replacing developers. AI making developers more effective at the systematic parts so they can focus on the interesting problems.