OpenAI and Ginkgo showed GPT‑5 can design run thousands of lab experiments — who will police the robots?

A.I
OpenAI and Ginkgo showed GPT‑5 can design run thousands of lab experiments — who will police the robots?
OpenAI and Ginkgo's GPT‑5 closed a loop with a robotic cloud lab to run 36,000 experiments. The capability is real — governance, DNA screening and industry rules are not.

Robots, cloud labs and a very busy GPT‑5

In February 2026 OpenAI and Ginkgo Bioworks announced that GPT‑5 had autonomously designed and run 36,000 biological experiments through a robotic cloud laboratory — a demonstration that AI can design run thousands of lab experiments without human hands. The headline number hides a simple engineering story: a large language model proposes designs and protocols; a cloud lab turns those designs into physical reagents and runs automated assays; measured results flow back and the model proposes the next batch. Humans set the research objective and the safety boundaries, but the iterative design–build–test loop can now operate at engineering pace, testing thousands of variants in days rather than months.

The result is programmable biology: computation stitches together protein language models, lab-control APIs and DNA synthesis (or cell‑free production) to tighten the cycle between a design on a screen and a readout from a plate reader. The Ginkgo–OpenAI work reported a roughly 40% reduction in the cost to produce a target protein — a commercial win that also illustrates how quickly capability scales once the automation and software are married.

How the loop actually works: models, robots and cloud APIs

At its core the closed loop is straightforward engineering. A generative model proposes sequence variants or experimental conditions; that proposal is converted into a protocol that a cloud lab scheduler understands; robots pipette and incubate samples; automated instruments capture data which is returned via API to the model; the model retrains or re‑scores and proposes the next set. That chain — model → protocol → robotic execution → measured feedback — is what people mean when they say AI can design run thousands of experiments at scale. Two technical advances made this practical: protein language models that generalise from millions of natural sequences, and the commoditisation of lab automation through cloud labs that expose standardised machine interfaces.

Important details matter. A model does not physically manipulate tubes; it outputs a plan. The cloud lab translates that plan into low‑level machine instructions and executes it. Those labs differ: some are closed commercial services with explicit user authentication, sample‑tracking and physical containment; others are bespoke academic rigs. The faster, cheaper the cloud‑lab primitives become, the lower the barrier for an AI loop to go from hypothesis to hands‑off experiment.

The governance gap: why this change matters for biosecurity

Technical novelty is only half the story. The other half is dual use: the same search‑and‑optimise logic that finds a better antibody or cheaper enzyme can, in theory, optimise viral traits or protocol steps that increase harm. Researchers have shown that models integrated with lab automation can optimise viral growth parameters and that large language models can walk users through complex virology workflows. Two recent, contrasting studies raised the alarm: a SecureBio/Scale AI experiment found novices using large language models could improve their accuracy on virology tasks, while Active Site’s work saw more mixed effects but still faster progress on some wet‑lab steps when AI assistance was available. Both point to a common conclusion — automation plus accessible models lowers the human skill barrier that used to be the strongest bottleneck in translating a harmful design into a real biological agent.

Regulation has not caught up. Screening of synthetic DNA at synthesis houses is still mostly voluntary in many jurisdictions; the 1975 Biological Weapons Convention has no explicit AI provisions; corporate model‑release frameworks are opaque and inconsistent. Analysts at RAND and the Nuclear Threat Initiative have called for managed‑access approaches that gate who can use what model and how it can interact with real‑world biology, while other groups urge better DNA synthesis screening and pre‑release biorisk evaluations of models. The core policy problem is translating a capability assessment (this model can suggest protocol steps) into an operational risk estimate (this model plus accessible cloud labs increases the plausible misuse rate). Those conversions remain contested and uncertain.

Practical safeguards that could reduce risk

There are technical and policy mitigations that are relatively low‑regret. At the technical end: model testing specifically against biological dual‑use scenarios, thorough red‑teaming, and integral telemetry on model behaviours when prompted about lab steps. On the lab side: mandatory managed access for instruments that will execute externally submitted protocols, cryptographic signing and attestation of who submits work, and robust user‑identity checks tied to institutional review. At the supply‑chain level, widespread mandatory screening of DNA orders against curated hazardous sequences would reduce an entire attack surface. The Nuclear Threat Initiative’s managed‑access concept — matching a tool’s risk to the user and usage environment — is an attempt to combine those pieces.

Those proposals are not panaceas. Model evaluations are not the same as real‑world misuse trials; DNA screening is imperfect (models can output sequences that avoid naive pattern matching); and a managed‑access regime requires international coordination to avoid jurisdictional leakage. Still, a layered approach — model gating, lab access controls, stronger DNA synthesis governance and better provenance for experimental plans — would materially reduce the speed at which a curious but malicious actor could move from idea to execution.

European and German angle: capability, commerce and rules

Europe is in a familiar position: strong industrial talent and world‑class biotech labs, but fragmented regulation and slower harmonisation. The EU’s AI Act creates a toolbox for classifying risky AI systems, but it was not written with lab automation in mind and will have to be interpreted or extended. Germany has strengths in hardware and automation that would make it an attractive place to host cloud labs — the machines are here — but procurement, export controls and the EU’s patchwork rules could frustrate consistent safeguards. In short, Europe can build the capability but needs coordinated policy and export‑control thinking to ensure it is not also the easiest place to misuse it.

Industrial policy choices matter too. If governments subsidise programmable‑biology infrastructure without binding safety conditions, they will accelerate both beneficial research and the attendant risks. The EU and member states should consider tying funding for automation and chip‑level subsidies to strict access controls and international biorisk norms; otherwise the region will have the machinery and not the governance to match it.

How close are we to "fully autonomous discovery"?

We are closer than many expected, but the dream of a lab with no human oversight is still qualified. Closed‑loop systems already discover protein variants, optimise expression and speed vaccine antigen screening. Autonomy shines at repetitive optimisation problems: create many variants, extract a metric, train a model, repeat. Where autonomy breaks down is in judgement and values: choosing meaningful goals, interpreting subtle biosafety signals, and making ethical trade‑offs. Those are not engineering problems alone; they are governance and values problems.

Practically, the labs of 2026 can run thousands of experiments with minimal hands‑on involvement. The bottlenecks shifting away from manual pipetting are DNA synthesis cost, instrument scheduling, and oversight. As DNA synthesis gets cheaper and cloud‑lab access broadens, wholly‑autonomous loops become more accessible. That does not mean imminent catastrophe, but it does compress timelines for both innovation and misuse. Policymakers and funders should treat capability‑scale demonstrations — like GPT‑5’s 36,000 experiment campaign — as wake‑up calls, not marketing milestones.

What regulators, companies and labs can do right now

Three pragmatic steps matter more than grand declarations. First, require DNA screening at the point of synthesis with a standard, auditable rule set and mandatory reporting. Second, require credentialed, audited access to cloud labs for any externally‑submitted protocols — an operational managed‑access regime. Third, insist that AI developers publish rigorous, reproducible capability evaluations for bio‑relevant tasks before releasing models, and require third‑party audits for high‑risk capabilities. Companies have started self‑policing — Anthropic raised its internal safety tier, OpenAI updated its preparedness framework — but voluntary measures alone will not be enough when the technology is widely available.

Ethics, inequality and the new access divide

Sources

  • OpenAI & Ginkgo Bioworks (OpenAI/Ginkgo study demonstrating autonomous experiments, DOI and technical report)
  • SecureBio / Scale AI (study on novices using LLMs for virology tasks, preprint)
  • Active Site (research on AI-assisted workflows in synthetic biology, preprint)
  • RAND Center on AI, Security and Technology (biosecurity and AI reports)
  • Nuclear Threat Initiative (managed access framework for biological AI tools)
  • U.S. National Security Commission on Emerging Biotechnology (policy reports)
  • Biological Weapons Convention (international treaty and UN Office for Disarmament Affairs documents)
Mattias Risberg

Mattias Risberg

Cologne-based science & technology reporter tracking semiconductors, space policy and data-driven investigations.

University of Cologne (Universität zu Köln) • Cologne, Germany

Readers

Readers Questions Answered

Q How can AI design and run thousands of lab experiments without human hands?
A
Q What are the risks of autonomous AI laboratories conducting experiments without human control?
A
Q What safeguards and regulations are needed for AI-run experiments in research labs?
A
Q How close are we to fully autonomous laboratories performing scientific discovery?
A
Q What are the ethical implications of AI designing experiments and making discoveries without human input?
A

Have a question about this article?

Questions are reviewed before publishing. We'll answer the best ones!

Comments

No comments yet. Be the first!