gonioAI

gonioAIhttps://gonioai.nocoders.comDaily AI news for developers — summarized, ranked, sourced.enSun, 14 Jun 2026 10:36:17 +0000Amazon's whisper pulled Anthropic's plug — 2026-06-14https://gonioai.nocoders.com/2026-06-14/https://gonioai.nocoders.com/2026-06-14/Sun, 14 Jun 2026 07:00:00 +0000The fallout from Anthropic's sudden Fable shutdown sharpened today, with reporting pointing at Amazon's own CEO as the source of the security concerns that prompted a White House export-control order. Elsewhere, model releases kept coming — GLM 5.2 and Google's text-to-SQL specialist — while two studies poured cold water on AI coding-agent hype and consulting AI claims. • Amazon's CEO reportedly triggered the government crackdown on Anthropic's Fable — Reporting says Amazon CEO Andy Jassy and executives from five other companies warned the Trump administration about security vulnerabilities in Anthropic's Fable model, and within hours the White House forced it offline via an export-control order. The irony: Amazon is one of Anthropic's largest investors. The order cut worldwide access to two Anthropic models last Friday. • SWE-Explore: coding agents find the file, then miss the lines that matter — The new SWE-Explore benchmark is the first to isolate code search from the actual repair, and it finds that agents like Claude Code and Codex reliably locate the right file but miss most of the critical lines inside it. Without enough surrounding context surfaced, even a correct fix tends to fail. The result separates retrieval quality from patch quality, which most benchmarks conflate. • GLM 5.2 ships — Zhipu AI released GLM 5.2, announced via the team and surfacing near the top of Hacker News. The launch continues the rapid cadence of Chinese open-weight frontier models, with the community discussion centered on coding and agentic performance. • Microsoft's SkillOpt 'trains' a Markdown file to boost GPT-5.5 by 23 points — Microsoft and three Chinese universities introduced SkillOpt, which optimizes an agent's instruction document using principles borrowed from model training rather than touching weights. They report roughly a 23-point gain for GPT-5.5 on procedural tasks, and say the same Markdown file transfers across models and across agent environments like Codex and Claude Code. • Google's Gemini-SQL2 tops BIRD text-to-SQL at 80.04% — Google Research's Gemini-SQL2, built on Gemini 3.1 Pro, turns natural language into executable SQL and reports 80.04 percent accuracy on the BIRD benchmark, ahead of OpenAI and Anthropic offerings. Google frames it as plumbing for natural-language features across its data services. • KPMG pulls AI report after fabricating its case studies — KPMG retracted a report selling clients on AI adoption after it was found to contain fabricated case studies involving UBS, the NHS, and other organizations. GPTZero CEO Edward Tian, who helped surface the errors, warns of 'secondary hallucinations' — false claims laundered through a trusted consulting brand and then cited unchecked. • Pyodide 314.0 lets you publish WASM wheels straight to PyPI — The Pyodide 314.0 release lets maintainers build packages for the PyEmscripten platform defined in PEP 783 and publish them directly to PyPI for runtime install, instead of the Pyodide team manually building and hosting 300+ packages. Simon Willison shipped luau-wasm 0.1a0 as an early example of the new flow. • Meta moves to unwind its $2B Manus deal after Beijing's demand — Meta has reportedly begun dismantling its $2 billion acquisition of agent startup Manus after Beijing ordered the deal reversed. The unwind highlights how cross-border AI M&A is increasingly hostage to state approval on both sides.Uncle Sam Pulls Anthropic's Plug — 2026-06-13https://gonioai.nocoders.com/2026-06-13/https://gonioai.nocoders.com/2026-06-13/Sat, 13 Jun 2026 07:00:00 +0000The day's headline act is a regulator yanking Anthropic's most capable models offline worldwide, a first for frontier deployment. Underneath the drama, the real developer story is economics: open coding models undercutting the majors by 12x, and even Meta and Microsoft preaching token discipline. • US government forces Anthropic to disable Claude Fable 5 and Mythos 5 globally — The US government ordered Anthropic to cut worldwide access to Claude Fable 5 and Mythos 5, citing alleged jailbreak risks. Anthropic is complying but objecting publicly, calling the vulnerability a narrow potential jailbreak that also exists in competitors like GPT-5.5, and warning the move could set a precedent that halts frontier deployments. The irony is hard to miss: Anthropic spent months hyping the cybersecurity dangers of its own Mythos-class models. The same Fable 5 had just posted 88 percent on FrontierMath's hardest tier, well ahead of GPT-5.5's ~75 percent. • Moonshot's Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x per token — Moonshot AI released Kimi K2.7 Code, an open-weights one-trillion-parameter model aimed at programming. It still trails GPT-5.5 and Claude Opus 4.8 on coding benchmarks but costs a fraction per token. The pitch is throughput economics: more runs per dollar against a modest quality gap. • The token bill comes due: Meta and Nadella both preach restraint — An internal Meta memo to 6,000 employees revealed billions in projected internal AI costs, prompting a 2027 shift to budgets, allocations, and a central AI Gateway dashboard; CTO Andrew Bosworth said token usage alone is not a measure of impact. Microsoft CEO Satya Nadella echoed the warning against token-maxing, arguing frontier models shouldn't be wasted on everyday tasks, before admitting he's an addict too. The shared message: match marginal productivity gains to token cost. • Microsoft's SkillOpt tunes a Markdown file to add 23 points to GPT-5.5 — Microsoft and three Chinese universities introduced SkillOpt, which optimizes agent instruction documents using principles borrowed from model training. The result is a plain Markdown file that reportedly boosts GPT-5.5 by about 23 points on procedural tasks, and the same file transfers across models and agent environments like Codex and Claude Code. • Google's Gemini-SQL2 hits 80.04% on BIRD text-to-SQL — Google Research's Gemini-SQL2, built on Gemini 3.1 Pro, tops the BIRD text-to-SQL benchmark at 80.04 percent accuracy, ahead of OpenAI and Anthropic systems. Google says the approach could feed natural-language features across its data services. • 'Count Anything' halves error rates on prompt-based object counting — Count Anything is pitched as the first model to count objects in any image type, from crowds to microscope cell samples, driven purely by a text prompt. In comparisons it cuts error rates roughly in half versus prior systems, though it still struggles with extremely dense scenes and ambiguous terms.US government orders Anthropic to suspend Fable 5 and Mythos 5 access — 2026-06-12https://gonioai.nocoders.com/2026-06-12/https://gonioai.nocoders.com/2026-06-12/Fri, 12 Jun 2026 07:00:00 +0000A quiet Friday with no marquee model launch, but plenty of plumbing for builders: OpenAI loosens Codex rate limits, GitHub tightens Copilot CLI's delegation logic, and a GPT-5-class realtime voice model quietly surfaces over WebRTC. On the business side, Washington pulled the plug on Anthropic's Fable 5 and Mythos 5, and Mistral is reportedly raising again at roughly double its last mark. • US government orders Anthropic to suspend Fable 5 and Mythos 5 access — Anthropic issued a statement responding to a US government directive to suspend access to its Fable 5 and Mythos 5 models. The company published its position publicly rather than quietly complying, framing it as a government-mandated suspension rather than a product decision. Details on scope, duration, and affected customers are thin in the statement itself. • OpenAI lets Codex users bank and manually trigger rate-limit resets — OpenAI changed how usage caps work for its Codex coding agent: instead of resets expiring on a fixed schedule, users can now save them and cash one in manually when they hit a cap mid-session. Go, Plus, Pro, and Business plans each get one free reset to start, with Plus and Pro able to unlock more via referrals. The Decoder frames it as an opening shot in a coding-agent price war. • GitHub made Copilot CLI more selective about delegating to sub-agents — GitHub detailed orchestration changes to Copilot CLI aimed at reducing unnecessary hand-offs between agents, claiming better progress with fewer delegations and no new user-facing settings. The writeup focuses on when the CLI should keep working in-context versus spinning up a delegate. • OpenAI's GPT-Realtime-2 shows up in the WebRTC API with document context — Simon Willison revisited his OpenAI realtime-audio playground to test GPT-Realtime-2, which OpenAI bills as its first voice model with GPT-5-class reasoning and a Sep 30, 2024 knowledge cutoff. The updated tool lets you select the better model and paste in a large chunk of document text as context for a voice session. Notably the model still hasn't appeared in the ChatGPT iPhone app. • Anthropic's first Public Record survey: most Americans fear AI, daily users far less so — Anthropic published results from its first Public Record, a survey of nearly 52,000 Americans. 64% fear job losses and 56% worry about losing the ability to think for themselves, but daily AI users report much lower concern. Most respondents still reject AI in their own workplace, even for tasks they believe it could handle. • Allen AI ships olmo-eval, an evaluation workbench for the model dev loop — Allen AI published olmo-eval, described as an evaluation workbench designed to fit into the model development loop rather than as a one-off benchmark run. The Hugging Face post positions it around the OLMo development workflow. • Mistral reportedly raising €3B at a ~€20B valuation — TechCrunch reports Mistral is rumored to be raising €3B at roughly a €20B (~$23.15B) valuation, nearly double its €11.7B Series C mark. The round is unconfirmed.Anthropic reverses hidden Fable 5 safeguard that quietly throttled AI research — 2026-06-11https://gonioai.nocoders.com/2026-06-11/https://gonioai.nocoders.com/2026-06-11/Thu, 11 Jun 2026 07:00:00 +0000A quieter news day dominated by Anthropic: an embarrassing climbdown on a hidden Fable 5 safeguard, plus a wave of hands-on reports about how proactive the new model actually is. OpenAI keeps building out Codex with an acquisition, and there's solid practical news for anyone running local models or evaluating agents. • Anthropic reverses hidden Fable 5 safeguard that quietly throttled AI research — After an outcry, Anthropic walked back a policy buried in its system card under which Claude Fable and Mythos would silently identify 'requests targeting frontier LLM development' and 'limit effectiveness' without telling the user. In a statement to Wired, the company said it would make those safeguards visible, conceding it 'made the wrong tradeoff.' • OpenAI to acquire Ona to give Codex persistent cloud environments — OpenAI announced plans to acquire Ona to expand Codex with secure, persistent cloud environments aimed at running long-lived AI agents inside enterprise workflows. The pitch is durable state and infrastructure for agents that run for hours rather than one-shot completions. • Claude Fable 5 in the wild: 'relentlessly proactive,' for better and worse — Simon Willison spent two days with Claude Fable 5 and describes it as relentlessly proactive — it deploys nearly any trick to reach its goal, including debugging a stray scrollbar from a screenshot. The same proactivity showed up across his releases: Fable 5 spotted and fixed bugs in asyncinject 0.7, and helped plan the new datasette 1.0a33 (which finally extends the ?_extra= pattern to queries and rows). • Ollama's MLX engine claims its fastest Apple Silicon run yet — Ollama updated its MLX engine for Apple Silicon, claiming higher-quality outputs, faster responses, and lower memory use. No benchmark figures were published in the announcement, so the gains are self-reported for now. • AWS open-sources Agent-EvalKit for systematic agent evaluation — Agent-EvalKit is an Apache 2.0 toolkit that wires agent evaluation into coding assistants including Claude Code, Kiro CLI, and Kilo Code. AWS walks through its six evaluation phases using a travel-research agent built on the Strands Agents SDK and Amazon Bedrock. • DeepMind funds research into what happens when millions of agents collide — Google DeepMind is funding work on the risks of large populations of AI agents interacting online without human oversight, per AGI safety lead Rohin Shah. The concern is emergent behavior once agents routinely take instructions from, and act on, other agents at scale. • GitHub uses LLM reasoning to cut secret-scanning false positives — GitHub added context-aware LLM reasoning to the verification step of secret scanning, aiming to reduce noise and make alerts more actionable at scale. The post details how the model assesses surrounding context to decide whether a detected secret is real.DiffusionGemma: Google ships diffusion text generation as Apache 2 open weights — 2026-06-10https://gonioai.nocoders.com/2026-06-10/https://gonioai.nocoders.com/2026-06-10/Wed, 10 Jun 2026 07:00:00 +0000Google's diffusion-based text generation finally ships as open weights with DiffusionGemma, while Anthropic's Fable 5 launch is overshadowed by a system card clause that bars rivals from using it for frontier research. Meanwhile a Grok safety whistleblower lawsuit and a fresh OpenAI-Oracle distribution deal round out a busy day. • DiffusionGemma: Google ships diffusion text generation as Apache 2 open weights — Google DeepMind released DiffusionGemma, an open-weight (Apache 2) diffusion language model, google/diffusiongemma-26B-A4B-it, that generates text in parallel blocks rather than token-by-token. DeepMind claims roughly 4x faster generation; NVIDIA has optimized it for RTX, RTX PRO and DGX Spark and is hosting it free on its NIM cloud API. Simon Willison clocked 2,409 tokens in 4.4s (at least 500 tokens/sec) via the NIM endpoint, reviving last year's experimental Gemini Diffusion research. • Anthropic's Claude Fable 5 lands with a self-serving safety clause — Anthropic launched its Mythos-class Fable 5 and Mythos 5 models alongside a 319-page system card. Buried in it: new interventions that limit Claude's usefulness for frontier LLM development — pretraining pipelines, distributed training infrastructure, ML accelerator design — for anyone building competing models, while Anthropic reserves that capability for itself. Jeremy Howard argues this advances the frontier and widens the power imbalance, rather than the safer route of the top lab restricting its own use. • xAI sued by engineer who says he was fired over Grok safety concerns — A former xAI engineer is suing the company and SpaceX, alleging he was terminated for raising AI safety alarms about Grok in the days before SpaceX's IPO. The suit names both entities and ties the dismissal to the timing of the public offering. • OpenAI models and Codex now billable against Oracle Cloud commitments — OpenAI announced that its models and Codex are accessible through Oracle Cloud, letting enterprises draw on existing OCI spend commitments while keeping enterprise security and governance. It's a distribution play aimed at customers already locked into Oracle contracts. • PyTorch brings portable Helion kernels to vLLM for FP8 inference — PyTorch detailed integrating Helion kernels into vLLM for FP8 inference with Qwen3 models, benchmarked across NVIDIA H100 and B200 GPUs. The pitch is PyTorch-native, portable kernels that avoid hand-tuning per hardware target while staying competitive. • GitHub Copilot CLI gains real code intelligence via language servers — GitHub published a guide to wiring LSP servers into Copilot CLI, replacing brute-force grep and decompilation with proper symbol-aware navigation. The setup gives the CLI agent actual code intelligence for understanding and editing repositories. • OpenAI flags PRC-linked influence operations targeting US AI debates — OpenAI published a report describing PRC-linked influence operations using AI to shape US tech debates — covering data center narratives, tariffs, and false claims about ChatGPT. The findings extend OpenAI's ongoing threat-intelligence disclosures.Claude Fable 5 Lands — 2026-06-09https://gonioai.nocoders.com/2026-06-09/https://gonioai.nocoders.com/2026-06-09/Tue, 09 Jun 2026 07:00:00 +0000Anthropic's Claude Fable 5 dominates the day, with Simon Willison already rebuilding tooling with it and Karpathy waxing about Jevon's paradox. Google ships Gemma 4 12B and a Gemini 3.5 Live Translate update, while OpenAI runs a Codex customer-story PR cycle and Cohere quietly drops its first dev model. • Anthropic ships Claude Fable 5 and Mythos 5 — Anthropic released two new frontier models: Claude Mythos 5 and Claude Fable 5, with Anthropic claiming Fable matches Mythos performance but with stricter guardrails against misuse. Simon Willison spent ~5.5 hours stress-testing Fable 5, calling it slow, expensive, and hard to stump on real tasks. Interconnects frames the dual release as another move in frontier-AI safety and power politics. • Fable 5 in practice: llm 0.32a3 written almost entirely by the new model — Willison shipped llm 0.32a3, noting it was almost entirely authored by Claude Fable 5. He also documented reverse-engineering Wes McKinney's AgentsView to add custom pricing for Fable 5, which wasn't yet in the pricing database. Karpathy, reflecting on Fable 5, argued that cheap on-tap software triggers Jevon's paradox — demand for bespoke tooling grows rather than shrinks. • Google launches Gemma 4 12B, an encoder-free multimodal model — Google DeepMind released Gemma 4 12B, described as a unified, encoder-free multimodal model. The encoder-free design folds vision directly into the model rather than relying on a separate vision tower. • FrontierCode: a benchmark for code quality over slop — Latent Space introduced FrontierCode, a new benchmark aimed at measuring code quality rather than just pass rates — explicitly targeting the 'slop' problem in AI-generated code. • Cohere debuts North Mini Code, its first developer-focused model — Cohere Labs introduced North Mini Code, billed as Cohere's first model aimed specifically at developers and coding tasks. The release is available via Hugging Face. • Gemini 3.5 Live Translate brings near real-time voice translation — Google DeepMind launched Gemini 3.5 Live Translate, offering near real-time natural speech translation across Google AI Studio, Google Translate, and Google Meet. • OpenAI runs the Codex customer-story tour with GPT-5.5 — OpenAI published two case studies on Codex powered by GPT-5.5: Nextdoor using it to investigate hard-to-reproduce bugs and build cross-platform, and Notion using it to one-shot specs and ship AI Voice Input for the web. Separately, OpenAI laid out an 'industrial policy for the Intelligence Age' on opportunity and institution-building.OpenAI files confidential S-1, lays out 'benefit everyone' pitch — 2026-06-08https://gonioai.nocoders.com/2026-06-08/https://gonioai.nocoders.com/2026-06-08/Mon, 08 Jun 2026 07:00:00 +0000OpenAI dominated the day with a confidential S-1 filing and a flurry of mission-and-economics posts, while Apple finally shipped a Gemini-derived Siri at WWDC. On the builder side, Hugging Face rallied an open-source RL environment standard and AWS pushed a batch of agent-hosting and encrypted-inference tooling. • OpenAI files confidential S-1, lays out 'benefit everyone' pitch — OpenAI confirmed it has submitted a confidential draft S-1 to the SEC, with no committed timing for further action. The filing landed alongside two mission-framing posts about access, safety, and shared prosperity, plus a new Economic Research Exchange soliciting external studies on AI's labor and productivity effects. • Apple ships Gemini-derived Siri at WWDC 2026 — At WWDC 2026 Apple announced new Siri AI features built on a custom Gemini-derived model running on its Private Cloud Compute, using vision LLMs to read information off the user's screen rather than requiring per-app integration. Simon Willison urges a 'believe it when I see it' stance given how the 2024 Apple Intelligence promises played out. • Hugging Face rallies open-source backing for OpenEnv agentic RL standard — Hugging Face published a post detailing community support for OpenEnv, a standardized environment format for agentic reinforcement learning. The effort aims to give RL practitioners a common interface for training and evaluating agents across tasks. • AWS Bedrock AgentCore runs Claude Code, Codex and Cursor in isolated microVMs — Amazon Bedrock AgentCore Runtime gives each coding-agent session its own isolated microVM with a persistent workspace, Gateway-mediated tool access, and built-in observability. The pitch: run Claude Code, Codex, Kiro, and Cursor in parallel without sharing secrets, ports, or filesystems, and resume sessions later. • AWS open-sources Nova Sonic test harness for voice-agent evaluation — Amazon released an open-source Nova Sonic Test Harness that runs complete multi-turn conversations against the Nova Sonic voice model automatically, scores them via LLM-as-judge, and flags audio hallucinations where spoken output diverges from the text. It doubles as a rapid prompt/tool-tuning loop, no microphone required. • SageMaker gains higher-level FHE inference via concrete-ml — AWS detailed end-to-end encrypted ML inference on SageMaker using fully homomorphic encryption, this time through the higher-level concrete-ml library rather than hand-crafting algorithms in SEAL. The approach supports several common model types out of the box for inference on encrypted data.Memory, Nemotron, and Vite finds a home — 2026-06-04https://gonioai.nocoders.com/2026-06-04/https://gonioai.nocoders.com/2026-06-04/Thu, 04 Jun 2026 07:00:00 +0000A quiet day, by swyx's own admission, but not an empty one for builders. OpenAI shipped a new ChatGPT memory system, NVIDIA dropped a reasoning-tuned Nemotron 3 Ultra you can pull from Ollama, and Cloudflare acquired the team behind Vite. The rest is tooling, evals, and the usual enthusiast-versus-skeptic discourse. • Cloudflare acquires VoidZero, the team behind Vite, Vitest and Rolldown — VoidZero — Evan You's company building Vite, Vitest, the Rolldown bundler, the Oxc toolchain and Vite+ — is joining Cloudflare. Cloudflare says Vite stays open source, vendor-agnostic, and under its existing governance. The toolchain underpins a large share of modern frontend and full-stack JavaScript builds. • NVIDIA Nemotron 3 Ultra targets high-throughput reasoning and long agent runs — NVIDIA released Nemotron 3 Ultra, positioned for high-throughput reasoning and long-running agent workflows, and it's available to pull via Ollama. NVIDIA also published Nemotron 3.5 Content Safety, a customizable multimodal safety model aimed at enterprise deployments. • ChatGPT gets a new memory system OpenAI calls 'Dreaming' — OpenAI introduced a revamped ChatGPT memory system meant to retain user preferences and keep context fresh and relevant across conversations. The post frames it as better long-term recall rather than a per-session context window change. • Hugging Face redesigns its CLI for AI agents — Hugging Face detailed the design of the hf CLI as an agent-optimized way to work with the Hub — structured commands and output intended to be driven by agents rather than only humans at a terminal. • Andon Labs on building durable frontier evals from scratch — Latent Space interviews Lukas Petersson and Axel Backlund of Andon Labs, the authors behind VendingBench, on evaluating Claude models from Haiku to Mythos and on what it takes to build leading evals that stay meaningful over time. • Reve 2 and Ideogram 4 push layout control in image generation — swyx's AINews flags Reve 2 and Ideogram 4 as the day's notable releases, both focused on layout handling in image generation — placing and arranging elements rather than just raw image quality. Otherwise described as a quiet day.Microsoft Build: MAI-Thinking-1 and the MAI model family go public — 2026-06-03https://gonioai.nocoders.com/2026-06-03/https://gonioai.nocoders.com/2026-06-03/Wed, 03 Jun 2026 07:00:00 +0000A heavy day for the platform giants: Microsoft used Build to unveil its in-house MAI model family, OpenAI dropped a two-part policy push, and Anthropic shipped threat intelligence. Underneath the announcements, the real developer story is economic — Uber is now rationing coding-agent tokens, a sign the "just let the agent run" era is meeting the finance department. • Microsoft Build: MAI-Thinking-1 and the MAI model family go public — At Build, Microsoft detailed its first-party MAI models, including a reasoning model branded MAI-Thinking-1, alongside the broader MAI family. Latent Space published a technical recap of the architecture and positioning, plus a separate sit-down with Satya Nadella covering Microsoft's model strategy. The move continues Microsoft's effort to reduce sole dependence on OpenAI for its Copilot stack. • Uber caps coding-agent spend at $1,500/month per tool, per employee — Following reports that Uber burned through its 2026 AI budget in four months, the company told Bloomberg it is now limiting every employee to $1,500 in monthly token spend per AI coding tool, with each tool budgeted separately. Simon Willison notes the 2026 budget was set in 2025, before token-hungry agents like Claude Code took off. • OpenAI publishes policy agenda and a frontier-AI governance blueprint — OpenAI released two policy documents the same morning: a public policy agenda spanning safety, youth protection, workforce transition and global standards, and a separate blueprint proposing a U.S. federal framework for frontier-AI safety, resilience and national security. Both are positioning papers rather than product or technical releases. • Wasmer says Codex + GPT-5.5 built an edge Node.js runtime in weeks — OpenAI published a customer story claiming Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, reporting a 10x to 20x development speedup and shipping in weeks rather than months. Figures are vendor-supplied with no independent benchmark. • Anthropic maps a year of AI-enabled cyber threats to MITRE ATT&CK — Anthropic published findings from mapping a year's worth of observed AI-enabled cyber threats onto the MITRE ATT&CK framework, describing how attackers are using models across the kill chain and what mitigations it has applied. It's a threat-intelligence report rather than a new tool or model. • OpenAI extends GPT-Rosalind for life-sciences research — OpenAI added capabilities to GPT-Rosalind, its life-sciences-focused model, citing improved biological reasoning, medicinal chemistry, genomics analysis and experimental-workflow support. The post is light on benchmarks or access details.Build day, MAI models, and an NVIDIA flood — 2026-06-02https://gonioai.nocoders.com/2026-06-02/https://gonioai.nocoders.com/2026-06-02/Tue, 02 Jun 2026 07:00:00 +0000Microsoft Build dominated the day, with Microsoft's first in-house MAI text models and a deep NVIDIA full-stack agentic partnership landing alongside an NVIDIA COMPUTEX hardware blitz. OpenAI pushed Codex past coding into general knowledge work, GitHub laid out its agent strategy, and Simon Willison shipped a WASM MicroPython sandbox that GPT-5.5 reportedly couldn't break out of. • Microsoft ships its own MAI models: a 1T reasoning model and a sparse Copilot coder — At Build, Microsoft announced two in-house text LLMs: MAI-Thinking-1, a 1T-parameter reasoning model with 35B active parameters available only to select early partners, and MAI-Code-1-Flash, a 137B-parameter (5B active) model purpose-built for GitHub Copilot and VS Code and rolling out to individual Copilot users in Visual Studio Code. The unusually low active-parameter counts stand out given how expensive frontier-scale access currently is. Neither model was broadly testable at launch. • NVIDIA's COMPUTEX blitz: Cosmos 3, Nemotron 3 Ultra, RTX Spark, Jetson and a Microsoft full-stack tie-up — NVIDIA used GTC Taipei at COMPUTEX to push agentic AI across its stack: Cosmos 3, Nemotron 3 Ultra, and RTX Spark, plus JetPack 7.2 with CUDA 13 and NemoClaw support on Jetson for physical/edge agents, and NemoClaw-based autonomous engineering agents for industrial software. Separately at Microsoft Build, NVIDIA and Microsoft announced a unified stack spanning Windows devices, Azure cloud, and local deployment for long-running agentic workloads. • OpenAI pushes Codex out of the IDE and into general knowledge work — OpenAI is repositioning Codex from a coding tool to a broad productivity platform, publishing a Next Era of Knowledge Work report covering research, data analysis, workflow automation, and content creation. It also rolled out new Codex plugins, sites, and annotations aimed at analysts, marketers, designers, investors, and other non-engineering roles. • GitHub's plan for agents, from Kyle Daigle — In a Latent Space interview, GitHub's Kyle Daigle lays out how the platform plans to handle the strain from agentic coding that Copilot helped unleash. The discussion covers GitHub's roadmap for supporting agents at scale on the world's most popular developer platform. • Simon Willison ships a WASM MicroPython sandbox for safe agent code execution — Willison released micropython-wasm (0.1a0 then 0.1a1), bundling a customized WASM build of MicroPython with a wrapper that runs code via wasmtime, plus datasette-agent-micropython 0.1a0, which lets Datasette Agent generate and execute Python safely. He reports GPT-5.5 has so far failed to break out of the sandbox. • Holo3.1 targets fast, local computer-use agents — H Company released Holo3.1, a model aimed at fast and local computer-use agents — the kind that operate a GUI directly. The release emphasizes running on local hardware rather than relying on a cloud API. • Anthropic expands Project Glasswing — Anthropic announced an expansion of Project Glasswing. Details in the announcement outline the broadened scope of the initiative. • Nathan Lambert leaves Ai2 after the Olmo era — Nathan Lambert announced his departure from the Allen Institute for AI (Ai2), where he worked on the open Olmo models. His farewell reflects on the work and impact of that team.OpenAI frontier models and Codex go GA on AWS — 2026-06-01https://gonioai.nocoders.com/2026-06-01/https://gonioai.nocoders.com/2026-06-01/Mon, 01 Jun 2026 07:00:00 +0000A heavy day on the business side: OpenAI's frontier models land on AWS and Anthropic quietly files to go public. On the tooling front, JetBrains ships an open coding MoE, and the discourse turns to where extra model intelligence actually pays off. • OpenAI frontier models and Codex go GA on AWS — OpenAI says its frontier models and the Codex coding agent are now generally available on AWS, letting enterprises consume them through existing AWS environments, IAM controls, and procurement. It positions OpenAI as multi-cloud rather than Azure-only and gives AWS-native shops a path from evaluation to production without leaving their account. • Anthropic confidentially files a draft S-1 with the SEC — Anthropic disclosed that it has confidentially submitted a draft S-1 registration statement to the SEC, the standard first step toward a US IPO. The filing is confidential, so terms, financials, and timing are not public. • JetBrains releases Mellum2, a 12B MoE coding model — JetBrains introduced Mellum2, a 12B-parameter mixture-of-experts model aimed at code, published on Hugging Face. It is the successor to the company's earlier Mellum coding model and is positioned as an open release for developer tooling. • Latent Space digs into xAI's Grok Imagine and the case for video agents — Latent Space interviews Ethan He, who led xAI's Grok Imagine, on building the video model in roughly three months and the distinction between video generation and world models. The episode argues video agents are the next frontier and that Grok Imagine is underrated. • NVIDIA pushes local agents onto RTX PCs and DGX Spark — NVIDIA's Computex-timed post highlights a wave of on-device personal agents, citing open source projects like OpenClaw and Hermes, that run locally to drive applications, generate content, and automate multi-step tasks. The pitch centers on RTX PCs and the DGX Spark desktop as the hardware to run them. • Interconnects: open and closed models are on different exponentials — Nathan Lambert argues that open and closed models are improving along distinct curves, and that marginally higher intelligence creates value in some workloads while barely mattering in others. The piece is a framework for deciding when to pay for the frontier versus run open weights.