2026-06-13

Uncle Sam Pulls Anthropic's Plug

The day's headline act is a regulator yanking Anthropic's most capable models offline worldwide, a first for frontier deployment. Underneath the drama, the real developer story is economics: open coding models undercutting the majors by 12x, and even Meta and Microsoft preaching token discipline.

US government forces Anthropic to disable Claude Fable 5 and Mythos 5 globally

The US government ordered Anthropic to cut worldwide access to Claude Fable 5 and Mythos 5, citing alleged jailbreak risks. Anthropic is complying but objecting publicly, calling the vulnerability a narrow potential jailbreak that also exists in competitors like GPT-5.5, and warning the move could set a precedent that halts frontier deployments. The irony is hard to miss: Anthropic spent months hyping the cybersecurity dangers of its own Mythos-class models. The same Fable 5 had just posted 88 percent on FrontierMath's hardest tier, well ahead of GPT-5.5's ~75 percent.

Why it matters: If a single regulatory finding can recall a model serving hundreds of millions of users overnight, every team building on a frontier API now has deployment continuity as a real risk, not a hypothetical.

US government forces Anthropic to disable Claude Fable 5 and Mythos 5 for all customers worldwide (The Decoder)
Anthropic's safety warnings may have just backfired — the government has pulled the plug on its most powerful AI (TechCrunch AI)
Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems (The Decoder)
[AINews] Fable and Mythos officially too dangerous to release (Latent Space (swyx))
Shepherd's Dog: A Game by the Most Dangerous AI Model (Hacker News)

Moonshot's Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x per token

Moonshot AI released Kimi K2.7 Code, an open-weights one-trillion-parameter model aimed at programming. It still trails GPT-5.5 and Claude Opus 4.8 on coding benchmarks but costs a fraction per token. The pitch is throughput economics: more runs per dollar against a modest quality gap.

Why it matters: For agentic coding workflows that retry and iterate heavily, raw quality matters less than cost per successful task, and a 12x price gap buys a lot of retries.

Open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x on price per token (The Decoder)

The token bill comes due: Meta and Nadella both preach restraint

An internal Meta memo to 6,000 employees revealed billions in projected internal AI costs, prompting a 2027 shift to budgets, allocations, and a central AI Gateway dashboard; CTO Andrew Bosworth said token usage alone is not a measure of impact. Microsoft CEO Satya Nadella echoed the warning against token-maxing, arguing frontier models shouldn't be wasted on everyday tasks, before admitting he's an addict too. The shared message: match marginal productivity gains to token cost.

Why it matters: When the companies selling and consuming the most tokens both start rationing them, model routing and cost-aware orchestration stop being optional optimizations.

Meta shifts from "tokenmaxxing" to token managing as internal AI costs reportedly hit billions (The Decoder)
Microsoft CEO Satya Nadella admits he's a token-maxer, too: "It's addictive" (The Decoder)

Microsoft's SkillOpt tunes a Markdown file to add 23 points to GPT-5.5

Microsoft and three Chinese universities introduced SkillOpt, which optimizes agent instruction documents using principles borrowed from model training. The result is a plain Markdown file that reportedly boosts GPT-5.5 by about 23 points on procedural tasks, and the same file transfers across models and agent environments like Codex and Claude Code.

Why it matters: If a transferable, training-free Markdown skill file delivers double-digit gains, prompt and skill engineering may beat fine-tuning for many agent tasks at zero retraining cost.

Microsoft's SkillOpt boosts GPT-5.5 by using nothing but a trained Markdown file (The Decoder)

Google's Gemini-SQL2 hits 80.04% on BIRD text-to-SQL

Google Research's Gemini-SQL2, built on Gemini 3.1 Pro, tops the BIRD text-to-SQL benchmark at 80.04 percent accuracy, ahead of OpenAI and Anthropic systems. Google says the approach could feed natural-language features across its data services.

Why it matters: Text-to-SQL is one of the most directly shippable enterprise AI features, and a benchmark lead on BIRD is a concrete signal for teams choosing a query-generation backend.

Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin (The Decoder)

'Count Anything' halves error rates on prompt-based object counting

Count Anything is pitched as the first model to count objects in any image type, from crowds to microscope cell samples, driven purely by a text prompt. In comparisons it cuts error rates roughly in half versus prior systems, though it still struggles with extremely dense scenes and ambiguous terms.

Why it matters: Generalized, promptable counting is a long-standing pain point in vision pipelines for science and logistics, and a 2x error reduction is a meaningful step toward a drop-in tool.

New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds (The Decoder)

Also worth a look

Open source AI must win (Hacker News)
AI OSS tool repo goes archived over night after raising $7.3M Seed (Hacker News)
OpenAI faces investigation from state attorneys general (TechCrunch AI)
Show HN: Paca – Lightweight Jira alternative for human-AI collaboration (Hacker News)
Andrew Yang thinks the next big startup opportunity is lowering the cost of living (TechCrunch AI)