‹ Home
For research labs & autonomous experimentation

The orchestration layer for the self-driving lab.

A self-driving lab is really two loops. The science loop — closed-loop experiment design over a materials search space. And the orchestration loop that keeps a fleet of agents and instruments safe, observable, auditable, and cost-bounded around the clock. I've built a production-grade orchestration loop. I'd bring it to your science loop.

The flagship

Finnick / Hermes — a self-improving multi-agent OS.

A production agent platform on dedicated infrastructure: roughly 30 engines, 5 concurrent learning loops, a curated model-routing gateway, a self-healing escalation bridge, and a nightly "Dream Cycle" that reviews the day and proposes its own improvements. It's designed so a human works about 10 hours a week while the agents carry the load.

E19 · SPC

Drift & regression detection

CUSUM and a windowed z-test from industrial process control, with auto-rollback — the right tool for instruments and models that drift over long campaigns.

ESCALATION

Self-healing recovery

Failures become teach-back lessons; the same class stops firing. Unattended recovery, not just alerting.

E26

Stability & hallucination

12-dimension behavioral stability index; content-type-specific hallucination thresholds.

CONDUCTOR / E20

Bandit model routing

Thompson sampling with a Wilson lower-bound confidence gate before any model can dominate.

MEMORY

Tiered memory + provenance

TTL-promoted tiers, write-time summarization, archive-not-delete, and a no-silent-drop guarantee on close.

DREAM CYCLE

Nightly self-improvement

Eight scheduled phases that consolidate, recombine, score, and propose improvements — failure-isolated units.

E28

Cron-time orchestrator

Spreads fire times to prevent top-of-hour contention across the agent fleet.

ROI

Cost accountability

Every task carries its AI cost vs. a role-matched human baseline, Bayesian-calibrated per job type.

Mapped to the solid-state chemistry lab

The orchestration layer your science loop will need.

Keep the robotics running, schedule around blocked steps, catch drift over long campaigns, and keep every autonomous action auditable and cost-bounded. That layer is what this OS already runs in production — the scaffolding that wraps an experiment-design module, not a replacement for it.

Autonomous-lab need
Pattern already in production
Silent drift in instruments or models over long campaigns
CUSUM / SPC regression detection + 12-dimension stability index + auto-rollback
Robotics uptime and recovering from failures unattended
Self-healing teach-back + failure-class circuit breaker
Experiments blocked on reagents, furnace time, prior results
Deferred lane with automated revisit — built for exactly this
Provenance and auditability for a regulated environment
Operating contracts + machine-readable drift audit trail + archive-not-delete memory
Cost accountability for autonomous compute and instrument time
Per-task ROI economics, Bayesian-calibrated per job type
Collecting and routing data across instruments and agents, 24/7
Multi-agent orchestration + three-surface event routing + slim per-job toolsets
Trusting autonomous analysis — no fabricated conclusions
Confidence-clamping (proven in OSINT) + type-specific hallucination thresholds
Scaffolding around experiment design — scheduling, gating, provenance
Dream Cycle + learning loops wrap the design module; they don't replace it
Keeping a self-modifying loop safe and human-supervisable
Compliance gates + tamper-evident self-modification + "one more expert before human"
Straight talk

What I own, and what I'd partner on

I own the orchestration and reliability layer: multi-agent scheduling, drift/SPC monitoring, unattended recovery, provenance and audit trails, cost governance, and the safety-gating of software actions. I'd partner on the science: closed-loop Bayesian / active-learning experiment design (e.g. BoTorch / Ax, DFT-informed priors), instrument integration (SiLA 2, OPC-UA, lab drivers), and physical safety interlocks — thermal, atmospheric, collision — which are a hardware-authority domain I'd integrate with, never replace. Knowing that edge precisely is the point.

The pattern register

33 production patterns, benchmarked against the literature.

Each pattern below runs in production, and each is set against the closest published work with the specific distinction noted. Search it, filter it, open any card to see the prior art and where it lives.

How I gate production AI builders

The agents are disciplined because the system makes them.

No agent touches production on vibes. Every unit of work passes through a contract, a quality panel, and a hard gate — enforced at the data layer, not by reminder.

01 · CONTRACT

Operating spec

Role / constraints / tools / workflow / hard-stops, re-read verbatim before every run.

02 · SPEC GATE

Definition-of-Done

No task is claimable until done, acceptance, and validation are populated. Enforced by the query.

03 · REVIEW

Expert brief → verdict

A structured brief, then independent expert-lens verdicts before any production write.

04 · QA

Review panel

A fresh expert panel re-verifies work before it is ever surfaced to a human.

05 · SAFETY

Kill-switch + rollback

Software-write gate, kill-switch manifest, and a literal rollback command on every change.

Behavioral discipline as doctrine

15 named traits — disciplined, precise, honest, follow-through — inherited by every agent and checked in QA. Consistency without fine-tuning.

Reliability is the product

Escalation count is tracked as a number that must trend down; every resolved failure teaches the system so the class stops recurring.

Let's pair my orchestration layer with your science.

I'd welcome the chance to compare notes — where the orchestration, reliability, and observability layers are headed, and where these patterns could accelerate the build.