A self-driving lab is really two loops. The science loop — closed-loop experiment design over a materials search space. And the orchestration loop that keeps a fleet of agents and instruments safe, observable, auditable, and cost-bounded around the clock. I've built a production-grade orchestration loop. I'd bring it to your science loop.
A production agent platform on dedicated infrastructure: roughly 30 engines, 5 concurrent learning loops, a curated model-routing gateway, a self-healing escalation bridge, and a nightly "Dream Cycle" that reviews the day and proposes its own improvements. It's designed so a human works about 10 hours a week while the agents carry the load.
CUSUM and a windowed z-test from industrial process control, with auto-rollback — the right tool for instruments and models that drift over long campaigns.
Failures become teach-back lessons; the same class stops firing. Unattended recovery, not just alerting.
12-dimension behavioral stability index; content-type-specific hallucination thresholds.
Thompson sampling with a Wilson lower-bound confidence gate before any model can dominate.
TTL-promoted tiers, write-time summarization, archive-not-delete, and a no-silent-drop guarantee on close.
Eight scheduled phases that consolidate, recombine, score, and propose improvements — failure-isolated units.
Spreads fire times to prevent top-of-hour contention across the agent fleet.
Every task carries its AI cost vs. a role-matched human baseline, Bayesian-calibrated per job type.
Keep the robotics running, schedule around blocked steps, catch drift over long campaigns, and keep every autonomous action auditable and cost-bounded. That layer is what this OS already runs in production — the scaffolding that wraps an experiment-design module, not a replacement for it.
I own the orchestration and reliability layer: multi-agent scheduling, drift/SPC monitoring, unattended recovery, provenance and audit trails, cost governance, and the safety-gating of software actions. I'd partner on the science: closed-loop Bayesian / active-learning experiment design (e.g. BoTorch / Ax, DFT-informed priors), instrument integration (SiLA 2, OPC-UA, lab drivers), and physical safety interlocks — thermal, atmospheric, collision — which are a hardware-authority domain I'd integrate with, never replace. Knowing that edge precisely is the point.
Each pattern below runs in production, and each is set against the closest published work with the specific distinction noted. Search it, filter it, open any card to see the prior art and where it lives.
No agent touches production on vibes. Every unit of work passes through a contract, a quality panel, and a hard gate — enforced at the data layer, not by reminder.
Role / constraints / tools / workflow / hard-stops, re-read verbatim before every run.
No task is claimable until done, acceptance, and validation are populated. Enforced by the query.
A structured brief, then independent expert-lens verdicts before any production write.
A fresh expert panel re-verifies work before it is ever surfaced to a human.
Software-write gate, kill-switch manifest, and a literal rollback command on every change.
15 named traits — disciplined, precise, honest, follow-through — inherited by every agent and checked in QA. Consistency without fine-tuning.
Escalation count is tracked as a number that must trend down; every resolved failure teaches the system so the class stops recurring.
I'd welcome the chance to compare notes — where the orchestration, reliability, and observability layers are headed, and where these patterns could accelerate the build.