Back to projects
Aug 01, 2024
3 min read

Agentic AI for Regulated Industries

Building multi-agent systems with LLM-as-judge governance for FDA, nuclear, and financial services — where every AI decision must be auditable

The Problem

Regulated industries want AI but can’t afford black boxes. When FDA, NRC, or FINRA auditors ask “why did your system make this decision?” — “the model thought so” isn’t an answer. I’m building the agentic AI systems and governance infrastructure that makes AI deployable where compliance isn’t optional.

What I’m Building

Through Colin McNamara LLC, I design and deploy multi-agent systems for organizations where every AI action must be traceable, auditable, and defensible.

Multi-Agent Architecture

  • Architecting production systems using Claude Code, LangGraph, and a skills layer built on MCP, Claude Agent SDK, and DeepAgents — from design through deployment and ongoing observability
  • Full lifecycle ownership: agent design, tool integration, orchestration patterns, production monitoring, and iterative improvement
  • Building systems where agents coordinate, check each other’s work, and maintain audit trails automatically

LLM-as-Judge & Evals Pipelines

  • Continuous auditing infrastructure where AI systems evaluate their own outputs against compliance criteria
  • Governance frameworks tailored per industry — FDA submission validation, NRC safety checks, FINRA auditability requirements
  • Both threat defense (prompt injection detection, guardrails, MoE layer integrity) and innovation capture (surfacing good patterns into self-reinforcing loops)

FrawdBot — Insider Threat Detection

  • Purpose-built for Google Workspace: behavioral pattern analysis across email, Drive, and admin logs
  • Graph neural networks mapping relationship patterns and anomaly detection
  • LLM-driven forensic investigation — turning raw signals into actionable intelligence

Cross-Industry Governance

The insight driving this practice: regulated industries face the same fundamental AI challenge — proving that automated decisions meet the standard their regulators require. The tooling differs, but the discipline is the same.

  • FDA — Automated labeling validation, nutritional analysis pipelines, ingredient safety workflows
  • NRC (Nuclear) — Decision traceability for safety-critical systems where “probably right” isn’t acceptable
  • FINRA — December 2025 requirements now mandate the same oversight and auditability disciplines that FDA and NRC have required for years

Observability Stack

Production AI without observability is just a demo. Every system I build includes:

  • LangSmith — LLM trace analysis and debugging (LangSmith Ambassador)
  • LangFuse — Production LLM monitoring and cost tracking
  • ClickHouse — High-performance analytics on agent behavior data
  • Ollama — Local model deployment for air-gapped and sensitive environments

Why This Matters

I spent four years at Oracle learning what happens when you run critical infrastructure without proper observability — and what happens when you fix that. The same transformation is happening now with AI systems. Organizations are deploying agents into production without the governance infrastructure to support them.

This practice exists to close that gap: building AI systems that regulators can trust, that teams can debug, and that organizations can defend under audit.

Let's Build AI That Works

Interested in building similar solutions?