·7 min read

I Chained my Agents to Audit a Production Codebase

SecurityAIWorkflow

Using Gemini CLI and Claude Code to Audit a 36K-Line Codebase

A step-by-step workflow for using AI tools together to find and fix security vulnerabilities in a production Next.js app.


The App

Mystic Cards is a tarot platform I'm building to bring users and artists together via a Stripe-connect powered marketplace. It's a full-stack Next.js 14 app with a React Native mobile client, paid subscriptions, OpenAI integrations, and 56 API routes — roughly 36,000 lines of TypeScript backed by 1,000+ test cases.

I wanted to do a security review, but manually auditing 56 API routes, auth flows, file uploads, and webhook handlers is the kind of task you put off indefinitely. So I instructed Claude Code to orchestrate the entire workflow, using Gemini CLI as a shell tool for the initial audit and then handling implementation itself.

By adding Gemini CLI to my CLAUDE.md instructions, Claude could invoke it directly, parse the results, and immediately act on them — no copy-pasting between tools. The entire process took a single working session.


Step 1: Run the Audit with Gemini CLI

The Gemini CLI is a terminal interface for Google's Gemini models. Its 1M+ token context window can ingest the full project in a single pass (unlike Claude's smaller context window). When Claude invokes Gemini, make sure it passes -p as an argument, that way Gemini will act without the need for user confirmation.

Bash
gemini -p "perform a security audit of this codebase, focusing on authentication, input validation, file uploads, CORS, and webhook handling"

The audit came back with 6 vulnerabilities, ranked by severity:

  1. Mobile JWT tokens trusted for 30 days without server-side re-validation
    • A cancelled subscriber could keep using premium features for up to a month
  2. Zero runtime input validation across all 14 API route handlers
    • Nothing checked whether incoming requests matched the expected format
  3. No MIME type validation on file uploads
    • Executable scripts could be uploaded disguised as images
  4. CORS middleware returning * for all origins
    • Any website could make authenticated requests on behalf of a logged-in user
  5. Unsafe JSON.parse() on external and DB-sourced data
    • A malformed JSON string from Stripe or the database could crash the server
  6. Dev webhook bypass with unguarded error handling
    • A bad payload could crash the webhook handler if the environment variable was misconfigured

None of these were bugs. The app worked correctly. But types don't exist at runtime, and security lives at runtime.


Step 2: Plan the Fixes

Claude Code transitioned directly into plan mode — a structured workflow where it explores the codebase, designs an approach, and presents it for approval before writing code. The plan grouped the 6 vulnerabilities into 4 phases, ordered by dependency:

PhaseFocusFiles ModifiedNew Tests
1JWT token security + DB re-validation77
2Zod schema validation for all routes1644
3File upload MIME validation + CORS37
4Safe JSON parsing + webhook hardening79

The ordering mattered. Phase 1 modified the Prisma schema, Phase 2 created utilities that Phase 3 depended on.


Step 3: Implement with Claude Code

With the plan approved, Claude implemented each phase sequentially. I also authorized it to create GitHub issues via gh issue create so every phase would be tracked. If you're more interested in the workflow than the code, feel free to skip to the workflow summary.

Phase 1: JWT Token Security

The fix introduced a tokenVersion field and a tiered validation strategy:

TypeScript
const TOKEN_FRESHNESS_SECONDS = 5 * 60 // 5 minutes

if (payload.iat && (now - payload.iat) > TOKEN_FRESHNESS_SECONDS) {
	const dbUser = await prisma.user.findUnique({
		where: { id: payload.id },
		select: { isPro: true, isArtist: true, tokenVersion: true },
	})

	if (!dbUser || dbUser.tokenVersion !== payload.tokenVersion) {
		return null // Token revoked or user deleted
	}
}

Fresh tokens skip the database. Stale tokens get a lightweight query. When Stripe webhooks change subscription status, tokenVersion increments — invalidating all outstanding tokens.

Phase 2: Runtime Input Validation with Zod

The codebase already had Zod installed — it just wasn't being used. Claude created a parseBody() helper and centralized schemas:

TypeScript
export async function parseBody<T>(
	request: NextRequest,
	schema: ZodSchema<T>
): Promise<ParseResult<T>> {
	let body: unknown
	try {
		body = await request.json()
	} catch {
		return { success: false, response: apiError('Invalid JSON body', 400) }
	}

	const result = schema.safeParse(body)
	if (!result.success) {
		const message = result.error.errors
			.map(e => `${e.path.join('.')}: ${e.message}`)
			.join(', ')
		return { success: false, response: apiError(message, 400) }
	}

	return { success: true, data: result.data }
}

Route-level changes were minimal — replace raw destructuring with schema validation:

TypeScript
// Before
const { deckId, spreadType, cards } = await request.json()

// After
const parsed = await parseBody(request, interpretSchema)
if (!parsed.success) return parsed.response
const { deckId, spreadType, cards } = parsed.data

Phase 3: Upload Security and CORS

Upload hardening added MIME type and size validation:

TypeScript
const ALLOWED_MIME_TYPES = [
	'image/jpeg', 'image/png', 'image/webp', 'image/gif'
]
const MAX_FILE_SIZE = 10 * 1024 * 1024 // 10MB

if (!ALLOWED_MIME_TYPES.includes(file.type)) {
	return apiError(`File type '${file.type}' not allowed`, 400)
}

CORS now distinguishes between web and mobile clients — web origins are checked against an allowlist, mobile requests get the wildcard, unknown origins get no CORS headers.

Phase 4: JSON.parse Safety

A safeJsonParse helper replaced 8 unsafe JSON.parse() calls, and the Stripe webhook dev bypass got proper error handling:

TypeScript
// Before — crashes on malformed body
event = JSON.parse(body)

// After — graceful fallback
try {
	event = JSON.parse(body) as Stripe.Event
} catch {
	return NextResponse.json({ error: 'Invalid JSON body' }, { status: 400 })
}

Step 4: Verify Everything Still Works

After each phase, Claude ran the full test suite. 1,098 tests passed across 93 files. TypeScript type-checking passed with zero errors.


The Workflow, Summarized

Claude Code
	├─ calls Gemini CLI (audit)
	├─ enters plan mode (design phases)
	├─ creates GitHub Issues (tracking)
	├─ implements + tests (4 phases)
	├─ runs test suite + type-check (verification)
	└─ push to GitHub (branch deployment)

Total files modified: ~20 | New test cases: 40+ | Existing tests updated: ~30


Takeaways

Chain AI tools through a single orchestrator. Claude Code can call any CLI tool on your machine. Gemini's large context window handles whole-codebase analysis; Claude Code's plan mode and tool use handle multi-file implementation. The entire chain stays in one conversation with zero manual handoff.

TypeScript types are not runtime validation. If your API routes destructure request.json() with only TypeScript annotations, you have zero runtime protection. Zod closes this gap.

JWT security is about more than expiry times. Without a revocation mechanism, there's always a window where stale permissions are honored. The tiered approach — trust fresh tokens, verify stale ones — balances security with database load.

Security hardening doesn't have to be a separate project. With structured planning and phased execution, an audit-to-implementation cycle can happen in a single session without disrupting feature work.

Consider continuous checking too. This workflow is a point-in-time snapshot. For ongoing security, pair it with CodeQL or Snyk in CI. The AI audit excels at finding architectural gaps that static analysis misses.