Uncle Sam Pulls Anthropic's Plug
The day's headline act is a regulator yanking Anthropic's most capable models offline worldwide, a first for frontier deployment. Underneath the drama, the real developer story is economics: open coding models undercutting the majors by 12x, and even Meta and Microsoft preaching token discipline.
US government forces Anthropic to disable Claude Fable 5 and Mythos 5 globally
The US government ordered Anthropic to cut worldwide access to Claude Fable 5 and Mythos 5, citing alleged jailbreak risks. Anthropic is complying but objecting publicly, calling the vulnerability a narrow potential jailbreak that also exists in competitors like GPT-5.5, and warning the move could set a precedent that halts frontier deployments. The irony is hard to miss: Anthropic spent months hyping the cybersecurity dangers of its own Mythos-class models. The same Fable 5 had just posted 88 percent on FrontierMath's hardest tier, well ahead of GPT-5.5's ~75 percent.
Why it matters: If a single regulatory finding can recall a model serving hundreds of millions of users overnight, every team building on a frontier API now has deployment continuity as a real risk, not a hypothetical.
- US government forces Anthropic to disable Claude Fable 5 and Mythos 5 for all customers worldwide (The Decoder)
- Anthropic's safety warnings may have just backfired — the government has pulled the plug on its most powerful AI (TechCrunch AI)
- Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems (The Decoder)
- [AINews] Fable and Mythos officially too dangerous to release (Latent Space (swyx))
- Shepherd's Dog: A Game by the Most Dangerous AI Model (Hacker News)
Moonshot's Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x per token
Moonshot AI released Kimi K2.7 Code, an open-weights one-trillion-parameter model aimed at programming. It still trails GPT-5.5 and Claude Opus 4.8 on coding benchmarks but costs a fraction per token. The pitch is throughput economics: more runs per dollar against a modest quality gap.
Why it matters: For agentic coding workflows that retry and iterate heavily, raw quality matters less than cost per successful task, and a 12x price gap buys a lot of retries.
The token bill comes due: Meta and Nadella both preach restraint
An internal Meta memo to 6,000 employees revealed billions in projected internal AI costs, prompting a 2027 shift to budgets, allocations, and a central AI Gateway dashboard; CTO Andrew Bosworth said token usage alone is not a measure of impact. Microsoft CEO Satya Nadella echoed the warning against token-maxing, arguing frontier models shouldn't be wasted on everyday tasks, before admitting he's an addict too. The shared message: match marginal productivity gains to token cost.
Why it matters: When the companies selling and consuming the most tokens both start rationing them, model routing and cost-aware orchestration stop being optional optimizations.
Microsoft's SkillOpt tunes a Markdown file to add 23 points to GPT-5.5
Microsoft and three Chinese universities introduced SkillOpt, which optimizes agent instruction documents using principles borrowed from model training. The result is a plain Markdown file that reportedly boosts GPT-5.5 by about 23 points on procedural tasks, and the same file transfers across models and agent environments like Codex and Claude Code.
Why it matters: If a transferable, training-free Markdown skill file delivers double-digit gains, prompt and skill engineering may beat fine-tuning for many agent tasks at zero retraining cost.
Google's Gemini-SQL2 hits 80.04% on BIRD text-to-SQL
Google Research's Gemini-SQL2, built on Gemini 3.1 Pro, tops the BIRD text-to-SQL benchmark at 80.04 percent accuracy, ahead of OpenAI and Anthropic systems. Google says the approach could feed natural-language features across its data services.
Why it matters: Text-to-SQL is one of the most directly shippable enterprise AI features, and a benchmark lead on BIRD is a concrete signal for teams choosing a query-generation backend.
'Count Anything' halves error rates on prompt-based object counting
Count Anything is pitched as the first model to count objects in any image type, from crowds to microscope cell samples, driven purely by a text prompt. In comparisons it cuts error rates roughly in half versus prior systems, though it still struggles with extremely dense scenes and ambiguous terms.
Why it matters: Generalized, promptable counting is a long-standing pain point in vision pipelines for science and logistics, and a 2x error reduction is a meaningful step toward a drop-in tool.
Also worth a look
- Open source AI must win (Hacker News)
- AI OSS tool repo goes archived over night after raising $7.3M Seed (Hacker News)
- OpenAI faces investigation from state attorneys general (TechCrunch AI)
- Show HN: Paca – Lightweight Jira alternative for human-AI collaboration (Hacker News)
- Andrew Yang thinks the next big startup opportunity is lowering the cost of living (TechCrunch AI)