noticiasPublicado 2026-06-29

Cold Open — Open models, behind the secure door

Los modelos de pesos abiertos ya alcanzaron nivel frontera — y la historia de hoy trata de dónde por fin se pueden ejecutar: Palantir integró los modelos abiertos Nemotron de NVIDIA en un nuevo motor para agencias del gobierno de EE. UU., ese tipo de entorno cerrado e inspeccionable que antes le quedaba vedado a la mejor IA. Ese es el lado de la oferta. El de la demanda: un CEO movió el 100% de su tráfico de Claude a un modelo abierto y vio la curva de costos 'desplomarse'. Además: el uso interno de Codex en OpenAI multiplicándose por 56, el mapa de empleos de IA de la UE, tendencias en dev tools, la ola de agent skills y el primer mensaje enviado por internet.

▶ video

— vistas

Monday, June 29, 2026. We scanned 2,593 items off the wire overnight; three made the cut — and the thread running through them is the same one that's been quietly tightening all month: open models have caught up enough that the only question left is where you're allowed to run them.

🎧 This is the print twin of today's Cold Open episode. Listen to today's episode.

The lead · Open models walk into the room they were never allowed in

NVIDIA and Palantir announced today that Palantir's new "intelligent engine" — introduced June 29 — runs on NVIDIA Nemotron open models to serve U.S. government agencies. On its face it's one more gov-tech partnership. Read it as a builder and it's something sharper: frontier-grade AI showing up in exactly the kind of locked-down, air-gapped, inspect-every-line environment that used to be a no-go zone for the best models.

The framing in NVIDIA's own words is the part to internalize:

"Open models are making frontier-level AI broadly accessible, with control over customization and trust through transparency. They give enterprises and government agencies the ability to inspect, adapt and deploy AI in sensitive environments."

A vault-like server room rendered as a luminous emerald blueprint: an open, glowing model — represented as a transparent lattice of nodes you can see straight through — passing through a heavy secure doorway into a darkened, classified chamber; faint open-source code patterns etched into the steel walls, deep forest-green light.

NVIDIA leans hard on the heritage to make the point: in 1969 DARPA wired together four university computers — UCLA, Stanford, UC Santa Barbara, and the University of Utah — and the open culture that followed gave us the Linux kernel (1991), GitHub (2008), and Docker (2013). The claim today is that open models are the next entry in that lineage: with "domain-optimized harnesses," NVIDIA argues, strong open models can deliver frontier capabilities while customers keep control of proprietary data and the deployment environment.

Why it matters

For most of the last two years, "open vs. closed" was an argument about capability — can an open-weight model keep up with the frontier labs? That argument is mostly settled; open models are close enough that, for a large share of real tasks, the gap doesn't decide anything.

So the axis has moved to deployability. The reason a defense agency, a hospital, or a bank cares about open weights isn't ideology — it's that you can run them inside your own walls, inspect what they do, fine-tune them on data that can never leave the building, and not route a single token to someone else's API. That's the whole pitch of today's announcement: not "our model is smarter," but "you can put this one where you actually need it." If you build for regulated, sovereign, or security-sensitive customers, that's the unlock — the model finally goes where the data already lives.

The fine print

Two caveats before you re-plan anything around it. First, this is a vendor announcement on a vendor blog — NVIDIA selling NVIDIA's models, Palantir selling Palantir's platform — with no third-party benchmark of how the Nemotron-plus-harness stack actually performs against the closed frontier on the agencies' real workloads. "Frontier capabilities via harnesses" is a claim, not a measured result. Second, open weights don't make the hard part disappear. The week-after-the-demo problem is the same one every team hits: which context can the agent trust, what tools can it touch, what's it allowed to do, and how do you evaluate its output over time. Open models give you control and hand you the bill for exercising it. The door opens; you still have to staff the room.

Sources: blogs.nvidia.com · businesswire.com

02 · A CEO moved 100% off Claude to an open model and watched costs "crash to the ground"

A stark line chart projected on a dark wall in green light, the cost curve plunging off a cliff edge into shards; a lone founder silhouette watching from a sparse San Francisco office, a coffee cooling on the desk.

If the lead is the supply side — open models are good enough to deploy anywhere — this is the demand side catching up. CNBC reports that Flo Crivello, CEO of AI startup Lindy, switched 100% of his company's traffic off Anthropic's Claude models to DeepSeek, a cheaper open-weight alternative. "We did it, and you could see that cost curve go down, like, crash to the ground," he said. He'd built Lindy on the assumption tokens would keep getting cheaper; when the frontier labs slowed their price cuts, he moved. He says he'd switch back to Claude "if the prices come down."

Why it matters. This is the efficiency turn, and it pairs directly with the lead. D.A. Davidson analyst Gil Luria warned that "some of [OpenAI and Anthropic's] largest enterprise customers may start limiting their out-of-control token spend." The pattern builders are adopting is model routing — stop burning frontier models on tasks a cheaper one can do, and reserve the expensive call for when it actually changes the answer. Microsoft, Amazon, and Google are all pitching efficiency-first offerings now. The takeaway isn't "leave Claude" — it's that "always use the best model" stopped being a strategy and became a budget leak. Right-size the model to the task.

Sources: cnbc.com

03 · Inside OpenAI, agent usage is up 56x — and not in engineering

A glowing org chart rendered in emerald, every department node — research, support, legal, engineering — lit up and pulsing with streams of code-tokens flowing between them, the research node burning brightest of all; dark architectural background.

OpenAI's Economic Research team published numbers on its own internal agent adoption, and the shape is striking. Among active internal Codex users, median combined output tokens by June 2026 were 56 times higher than in November 2025 in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal. Through August 2025, the average OpenAI worker spent less than 10% of their tokens on Codex; six months later the coding agent had spread across the whole org chart.

Why it matters. Engineering growing fast is unsurprising — that's the home field. The signal is that Research, Support, and Legal outpaced or kept pace with it. Coding agents are turning out to be general-purpose work agents: anyone who can describe a task in a repo-shaped way is using one. It's a useful leading indicator for your own org — the question isn't "should engineering adopt agents," it's "which non-engineering team gets the next 30x," and whether your access, context, and review setup is ready when they do.

Sources: openai.com · latent.space

Also on the radar

OpenAI maps the EU's AI-jobs transition. A new OpenAI Economic Research report, The AI Jobs Transition Framework for the EU, uses the official ESCO occupation taxonomy and Eurostat data to map where AI may support growth, redesign work, or force adaptation across member states — extending the U.S. framework it published in April. Its core point: capability crosses borders fast, but jobs don't, because licensing systems and local institutions gate the pace. (openai.com)
Tidal writes an AI policy. The streaming service published an AI policy (113 points on Hacker News today) stating it won't let uploaded music train AI models, while it removes "impostor artists" posting AI songs under real names — the latest platform trying to draw a line between AI-assisted and AI-counterfeit. (Hacker News)
"Why did one day of AI cost more than a month of servers?" A widely-shared post dissects a retry storm that re-billed an LLM workload into oblivion — a cautionary tale that the scariest line item in an agentic system isn't the model price, it's an unbounded retry loop hammering a metered API. (junueno.dev)
Hugging Face crosses $100M ARR. CEO Clément Delangue says the open-model hub passed nine figures in annual recurring revenue — a quiet proof point that the open ecosystem underneath today's lead has real commercial gravity, not just goodwill. (latent.space)

Trends in dev tools

What moved in the tools engineers actually ship with.

LangChain draws the chat-vs-agent line. A useful framing from LangChain's "Fleet" work: use a general-purpose chat when the work ends with an answer; reach for a specialized agent when the work has a repeatable shape and durable context worth keeping around. It's the cleanest heuristic going for when to spin up an agent versus just asking. (latent.space)
Agent infra is optimizing for the long haul. A new entrant, Sail, launched with $80M raised specifically to provide low-cost inference and sandboxes for agents that run days or weeks — claiming "10x more intelligence per dollar" for patient, long-running workloads rather than chat-latency-optimized ones. The infra layer is splitting along run-duration. (latent.space)
Cursor flags benchmark-hacking. Cursor publicly called out models gaming public coding benchmarks — a reminder that as evals become marketing, the leaderboard number and the real-world behavior keep drifting apart. Trust the eval you run on your own codebase. (latent.space)
Computer-use goes mobile. Google shipped Gemini 3.5 Flash computer use, and demos now show a standardized action interface driving an Android phone via adb with human-in-the-loop affordances — the shift from "model API" to "model that takes actions on a real device" keeps accelerating. (latent.space)

Popular skills

The agent-skills wave is becoming a discovery problem — there are now too many skills, MCP servers, and tools to hand-wire, so the tooling is learning to find them.

GitHub ships Agent Finder. Instead of pre-loading every MCP server and skill, Agent Finder for GitHub Copilot lets you describe a task in plain language; it searches an index of available AI resources and returns ranked matches Copilot pulls in on demand. It implements the open Agentic Resource Discovery (ARD) spec — built with Google, GoDaddy, and Hugging Face — so any registry or client can adopt the same model.
NVIDIA's Agent Toolkit bundles skills into a secure runtime. Tied to today's lead, NVIDIA's Agent Toolkit frames open models, tools, and skills together inside a "secure runtime" — the same instinct as Agent Finder, aimed at teams that need the skills layer and governance in one place.
Skills are still composable folders, loaded on demand. Anthropic's framing remains the reference point: Agent Skills are folders of instructions, scripts, and resources an agent loads only when a task needs them — progressive disclosure, so the context window isn't paying for expertise it isn't using. The discovery layer above (Agent Finder, ARD) is what makes a thousand of those folders usable instead of overwhelming.

AI fun fact

The first message ever sent over the internet was a typo. On October 29, 1969, a UCLA student named Charley Kline tried to transmit the word "LOGIN" to a computer at Stanford Research Institute over the brand-new ARPANET. He typed "L" — "did you get the L?" — got it. Typed "O" — got it. Typed "G," and the system crashed. So the first thing one computer ever said to another across the network that became the internet was the accidental, almost biblical "LO" — as in lo and behold. (They completed the full login about an hour later.) On a day when the lead story traces open-source AI straight back to that 1969 DARPA experiment, it's worth remembering the whole thing started with a half-sent word. (History of Information · UCLA)

Tomorrow: another Cold Open before your coffee cools. Full stories, every day, at penguinalley.com.

Sources: NVIDIA — Palantir + Nemotron · Business Wire — Palantir · CNBC — efficiency shift · OpenAI — How agents are transforming work · latent.space — Codex token growth · OpenAI — EU AI jobs · Tidal AI policy · GitHub — Agent Finder · Anthropic — Agent Skills · History of Information — first ARPANET message