newsPublished 2026-06-24

Cold Open — GPT-5 helps crack a 3-year immunology mystery

OpenAI says GPT-5 Pro helped immunologist Derya Unutmaz crack a 3-year-old T-cell mystery — today's dominant story, and a real test of AI as a research collaborator. Plus OpenAI's push for shared AI standards, the 40%-of-OpEx energy bill behind every AI factory, dev-tool trends, the agent-skills wave, and one fun fact about the first chatbot.

▶ video

— views

Wednesday, June 24, 2026. We scanned more than 2,600 items off the wire overnight. Three made the front page, a handful more made the radar, and the lead is the kind of story that used to take a lab a year to whisper about — an AI being credited with cracking a mystery a human had been stuck on for three.

🎧 This is the print twin of today's Cold Open episode. Listen to today's episode.

The lead · GPT-5 helps an immunologist crack a 3-year-old mystery

A glowing T-cell rendered as luminous teal filaments under a microscope, a single bright connection lighting up in a field of dimmer cells, the moment a pattern resolves.

OpenAI published an account of how GPT-5 Pro helped immunologist Derya Unutmaz make progress on a question about T-cell behavior that had stumped him for roughly three years. In OpenAI's telling, the model didn't just summarize the literature — it proposed a mechanistic explanation that fit the data, the kind of leap the scientist credits with breaking the logjam. OpenAI frames the result as potentially relevant to cancer and autoimmune research, the two fields where understanding how T-cells switch on and off matters most.

The reason this lands as a lead and not a footnote: it is a concrete, named, single-domain example of the thing the whole industry keeps gesturing at — a frontier model used as a reasoning collaborator on an unsolved research problem, not a writing assistant. The scientist had the data and the expertise. The model contributed a hypothesis worth testing.

Why it matters

For builders, the useful signal is in how the win happened, not the headline. This was not a one-shot prompt. It was an expert in the loop, feeding the model real domain context and pressure-testing what came back — exactly the pattern that separates a useful research tool from a confident hallucination machine. The model's job was to generate a candidate explanation; the human's job was to know whether it was plausible and worth the lab time to check.

That is the template worth copying into your own stack. The frontier models are now good enough that the bottleneck has moved from can it produce an idea to can you tell which of its ideas is right. If you are wiring an LLM into any expert workflow — legal, medical, engineering, finance — the leverage is in the verification layer you build around it, not in the prompt. The expert who can quickly separate a good hypothesis from a wrong one gets a force multiplier. Everyone else gets fluent nonsense.

The fine print

Three caveats before anyone declares AI a co-author on the next Nature paper. First, this is OpenAI's own account of a customer's success — a vendor case study, not an independent write-up or a peer-reviewed result, and the marketing incentive points one direction. Second, "solved" is doing heavy lifting: a hypothesis that fits the data still has to survive the wet lab, replication, and review before it is a finding — the model proposed, it did not prove. Third, this is a single anecdote from one highly capable scientist who knew exactly how to interrogate the model; it is evidence the pattern can work, not that it works on average. The right read is optimism with a clipboard: genuinely exciting, still provisional.

Sources: openai.com/index/gpt-5-immunology-mystery

02 · OpenAI pushes for shared standards on advanced AI

A set of translucent measuring instruments and rulers floating around a glowing model core, calibration marks lighting up as they align, on a dark teal field.

OpenAI published a piece on helping build shared standards for advanced AI — backing common evaluation frameworks, safety practices, and international cooperation, channeled in part through a body it calls the Appia Foundation. The pitch is that as models get more capable, the industry needs agreed-upon ways to measure and compare what they can and cannot safely do, rather than every lab grading its own homework.

Why it matters. For anyone shipping AI products, "standards" sounds like governance theater until you remember that your compliance story rides on it. Shared evaluation frameworks are how "is this agent safe to deploy" stops being a vibe and becomes a checklist a customer's risk team will actually accept. Keep the healthy skepticism, though: a frontier lab helping write the rules it will be graded against is a conflict worth watching, and a foundation is only as credible as the independence of who sits on it.

Sources: openai.com/index/helping-build-shared-standards-for-advanced-ai

03 · The 40% energy bill hiding inside every "AI factory"

A dark data hall of GPU racks glowing teal, a luminous power meter overlay showing a large slice consumed, heat shimmer rising between the rows.

NVIDIA published an engineering piece on maximizing AI-factory energy efficiency with a number worth sitting with: power can account for up to 40% of the operating expense of running an AI data center. Every watt goes to overhead, data ingestion, training, or inference — and NVIDIA's argument is that full-stack optimization across inference and training is now a first-class lever, not an afterthought, for getting more useful work out of the same electricity bill.

Why it matters. This is the unglamorous layer under everything else on this page. The agentic stack everyone is building — long sessions, multi-step reasoning, models that "just go" — runs on inference, and inference runs on power. When the cost of a watt becomes the cost of a token, efficiency per watt quietly decides whether your long-running agent is affordable to leave on. The frontier you read about is reasoning; the frontier that pays for it is energy.

Sources: developer.nvidia.com

Also on the radar

Infra — NVIDIA and AWS at production scale: the two are wiring NVIDIA AI infrastructure across Amazon OpenSearch and EC2 — faster vector search and better GPU price-performance — aimed at the gap between a working demo and a system that survives real traffic.
Dev tools — Datasette 1.0a35: Simon Willison calls it a big release, headlined by a new in-browser "Create table" interface backed by a JSON API — the open-data tool inching toward its 1.0.
Browser ML — Cross-Origin Storage in Transformers.js: Hugging Face is experimenting with a proposed browser API so sites can share cached models across origins instead of every page re-downloading the same weights.
Theory — Critique of Agent Model: a paper asking, amid the "coding agent" and "AI co-scientist" marketing, what an agent and agency actually mean — a useful cold shower for anyone over-anthropomorphizing the tools.

Trends in dev tools

What moved in the research and tooling engineers actually ship with.

Coding agents are learning when not to call the expensive check. Bayesian control for coding agents frames a real bottleneck: modern coding agents pair an LLM generator with cheap diagnostics and expensive verifiers, and deciding when to spend the costly verification is its own control problem. The efficiency frontier is moving into the orchestration, not just the model. (arxiv.org)
The command line keeps winning as the agent's interface. GUI vs. CLI: Execution Bottlenecks in Computer-Use Agents tries to separate how much of a computer-use agent's failures come from the interface versus the reasoning, comparing screen-only control against programmatic command interfaces — quiet evidence for the CLI-first instinct. (arxiv.org)
Agentic security is getting its own benchmark. RIFT-Bench proposes dynamic red-teaming for agentic AI systems, on the premise that autonomous agents expose attack vectors beyond classic LLM vulnerabilities and need a unified way to be compared. Evals are racing to catch up to agents. (arxiv.org)
Long-running agents need fault attribution, not just logs. SAFARI tackles a problem that only shows up at scale: as an agent's trajectory grows to hundreds of steps, figuring out which step caused the failure becomes its own investigation. Debugging the agent is becoming a discipline. (arxiv.org)

Popular skills

This week the agent-skills signal came mostly from research rather than product launches — academia formalizing the portable-skill pattern practitioners already use.

The skill library, as an open research problem. Evolving Programmatic Skill Networks studies continual skill acquisition where an agent has to construct, refine, and reuse an expanding library of programmatic skills — the "folder of skills" instinct, written up as a hard problem instead of a convention. (arxiv.org)
Giving an agent skills changes where it breaks. The same computer-use study measures "skill-mediated" agents — ones that call packaged skills — against screen-only ones, early evidence that handing an agent skills doesn't just add ability, it moves the bottleneck. (arxiv.org)
Skills are crossing into physical AI. InSight has vision-language-action models acquire new manipulation skills beyond their training set — the same idea (reusable, composable skills an agent picks up) jumping from coding agents into robotics. (arxiv.org)

AI fun fact

The first chatbot already fooled the people who built it. In 1966, MIT's Joseph Weizenbaum wrote ELIZA, whose "DOCTOR" script imitated a therapist by mostly rephrasing your own words back at you as questions. It understood nothing. Yet Weizenbaum was unsettled to find that people — including his own secretary, who knew it was a program — became emotionally attached, and she reportedly asked him to leave the room so she could "talk" to it in private. Sixty summers before an AI gets credit for cracking an immunology mystery, the first one taught us our oldest bug: we are wired to read a mind into the machine. (the original ELIZA paper)

Sources: openai.com/index/gpt-5-immunology-mystery · openai.com/index/helping-build-shared-standards-for-advanced-ai · developer.nvidia.com · blogs.nvidia.com · arxiv.org/abs/2606.24551 · dl.acm.org (ELIZA, 1966)