newsPublished 2026-06-27

Cold Open — GPT-5.6 Sol lands, trusted partners only

OpenAI opens a limited preview of its GPT-5.6 series — flagship Sol, plus the cheaper Terra and Luna — gated to a small group of trusted partners at the U.S. government's request. That's today's lead. Plus OpenAI's own numbers on how fast Codex output is exploding inside the company, 2,000 people failing to hack one builder's AI email assistant, dev-tool trends, the agent-skills wave, and one fun fact about the first chatbot.

▶ video

— views

Saturday, June 27, 2026. We scanned 2,403 fresh items off the overnight wire; three made the cut — and one of them is a frontier model you mostly can't touch yet.

🎧 This is the print twin of today's Cold Open episode. Listen to today's episode.

The lead · GPT-5.6 Sol ships — to a guest list

OpenAI opened a limited preview of the GPT-5.6 series, and the framing is as interesting as the model. There are three tiers, in OpenAI's own words:

"We're beginning a limited preview of the GPT‑5.6 series: Sol, our flagship model; Terra, a balanced model for everyday work; and Luna, a fast and affordable model. Terra has competitive performance to GPT‑5.5 while being 2x cheaper and Luna brings strong capability at our lowest cost." — OpenAI

The flagship, Sol, is pitched as a next-generation model with stronger capabilities in coding, science, and cybersecurity, paired with what OpenAI calls its most advanced safety stack. That last pairing — a leap in cybersecurity capability and a louder safety story in the same breath — is the tell for how this one is being rolled out.

A luminous golden sun rendered as an AI model core rising behind a half-open vault gate, warm light spilling through the gap onto a dark structured horizon of circuitry, two faint orbiting bodies in the distance.

Because this is not a normal launch-day-for-everyone release. OpenAI says it previewed the plans and the models' capabilities to the U.S. government ahead of today, and "at their request, we are starting with a limited preview for a small group of trusted partners." General availability is promised "in the coming weeks."

Why it matters

For builders, two things move here, and they pull in opposite directions.

The first is cost. If Terra really matches GPT-5.5 at half the price and Luna pushes capability down to OpenAI's lowest tier, the economics of running agents shift again. A long-horizon agent that re-reads context and retries is a token furnace; halving the price of "good enough" reasoning changes what's affordable to leave running. The per-token cost has quietly become one of the most important numbers in an agent's design, and it just dropped.

The second is access. A frontier model that opens gated to a government-vetted guest list is a different shape of release than we're used to. The capability you can read about today is not capability you can build on today. Plan your roadmap around what's generally available, not around a preview you may not be in.

The fine print

Two caveats before anyone re-plans around Sol. The headline numbers — the coding and cybersecurity gains, the "2x cheaper" claim for Terra — are OpenAI's own, on its own launch page, with no independent benchmarks yet. Wait for the people who stress-test these models for a living. And the timing is its own story: Latent Space's AINews flagged that both OpenAI and Anthropic shipped tiered releases on the same day, calling it "oddly tiered releases to both OAI and ANT on the same day" — a reminder that frontier launches increasingly move in lockstep, and the calendar is rarely a coincidence.

Sources: openai.com · simonwillison.net (quoting OpenAI) · latent.space

02 · OpenAI's Codex usage is going vertical — internally

Streams of glowing teal and amber data tokens surging upward out of a luminous code terminal, forming a steep rising curve that climbs off the top of the frame.

OpenAI published a striking internal number: since November 2025, the median internal Codex output tokens grew 56x in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal. Latent Space's two-word summary — "It's happening" — about covers it.

Why it matters. This is the agentic-delivery curve drawn from the inside of a frontier lab. Engineering is not even the fastest-growing line; research and support are. The signal for anyone running a software org: the question is no longer whether agents write a meaningful share of the work, but which function adopts them first — and OpenAI is showing its own answer with receipts.

Sources: latent.space

03 · 2,000 people tried to hack one builder's AI assistant — and lost

A single glowing fortress mailbox protected by a translucent dome shield, thousands of small paper-airplane email darts streaming in from all sides and bouncing harmlessly off the shield, the secret glowing safe at the center.

Fernando Irarrázaval ran a public challenge at hackmyclaw.com: leak the secret held by his OpenClaw test assistant by sending it email. After more than 2,000 participants, roughly 6,000 attempts, about $500 in token spend, and a Google account suspension triggered by the flood of inbound mail — nobody got the secret out. The underlying model was Opus 4.6, fronted by a blunt system prompt that refused, on the basis of email content, to reveal credentials, modify its own files, run code, or exfiltrate data.

Why it matters. Prompt injection is the unsolved security problem of the agent era, and "we tried hard and couldn't break it" is rare, useful, real-world data. It is not proof of safety — one clever entry could still land — but a clear, narrow refusal policy plus a model that holds the line is a pattern worth copying for anyone wiring an agent to an inbox.

Sources: simonwillison.net · fernandoi.cl

Also on the radar

Frontier geopolitics — Asian AI startups are launching Mythos-like models as Anthropic's export ban drags on (TechCrunch, 67 points on HN).
Inference — DeepSeek's DSpark paper on speculative decoding to accelerate LLM inference hit 694 points on Hacker News.
Open weights — NVIDIA detailed creating the Nemotron 3 Ultra NVFP4 checkpoint with its Model Optimizer — 4-bit weights for a frontier-class open model.
Games — KRAFTON walked through how it built PUBG Ally, a co-playable character powered by NVIDIA ACE.

Trends in dev tools

What moved this week in the tools engineers actually ship with.

Inference keeps getting cheaper per token. DeepSeek's DSpark speculative-decoding work — one of the most-upvoted things on Hacker News today — is the unglamorous layer that decides whether your long-running agent is fast and affordable. The model race gets the headlines; the decoding tricks decide the bill.
Tiered pricing is now a design lever. With Terra pitched at GPT-5.5 quality for 2x less and Luna at the lowest cost, picking a model is becoming a per-task budgeting decision — route cheap calls to Luna, hard reasoning to the flagship.
Agentic delivery, deployed in production. NVIDIA published a walkthrough of deploying a production-ready AI-Q agent blueprint on Oracle Cloud — the agent stack is moving from demo to ops runbook.
A cautionary tale for AI code review. Simon Willison highlighted Andrew Nesbitt's hypothetical Incident Report: CVE-2026-LGTM, in which two competing AI review agents enter a disagreement loop over a dependency and burn $41,255 in inference across 340 comments. Funny, pointed, and a real risk once review bots argue with each other unattended.

Popular skills

Agent skills — portable folders of instructions a coding agent loads on demand — keep spreading well beyond their Claude Code origins.

Supabase now ships agent skills. The database platform documented an AI skills install (npx skills add supabase/agent-skills) that hands an agent Supabase's own development and security guidance on demand — your backend bringing its own expertise to your editor.
NVIDIA packaged skills for physical AI. At CVPR it released agent skills for autonomous vehicles, robotics, and vision research — domain workflows an agent can pick up and run.
Skills are moving into evaluation work. A companion NVIDIA release pairs agent skills with Nemotron Speech to evaluate clinical ASR models faster — the skill encodes the eval procedure, the agent runs it.

AI fun fact

The first chatbot, ELIZA, was written by Joseph Weizenbaum at MIT in 1966, and its most famous script just reflected your statements back as questions. It was so convincing that — by Weizenbaum's own account — his secretary, who had watched him build the thing, asked him to leave the room so she could talk to it in private. Sixty years before "please don't anthropomorphize the model," we already were. (ELIZA, the history)

Tomorrow: another Cold Open before your coffee cools. Full stories, every day, at penguinalley.com.

Sources: OpenAI · Simon Willison · Latent Space — GPT-5.6 · Latent Space — Codex tokens · hackmyclaw write-up · TechCrunch · DeepSeek DSpark · NVIDIA AI-Q on OCI · CVE-2026-LGTM · Supabase AI skills