🍦 OpenAI Launches "Codex Sites"
openai turned codex into a hosted prompt-to-app builder on cloudflare workers, anthropic shipped opus 4.8 and ipo, while microsoft revealed its own mai model family
hey everyone, foma ice cream guy is back with some more ai news, and this week im tired of writing the intros. so here are the top stories.
openai shipped codex sites and walked into the lovable / v0 / bolt fight with managed hosting
on june 2 openai launched codex sites, a hosted prompt-to-app builder where codex generates a full web app and openai hosts the deploy — cloudflare-worker output, two-stage publish flow (save a version → deploy a version), built-in storage primitives (d1 for state, r2 for files), and custom domains. it’s a two-front attack. front one: lovable / v0 / bolt / replit agent — the entire prompt-to-app category was built on the assumption openai would stay one layer up. on tuesday openai walked downstream and took the hosting layer. the announcement explicitly listed vercel, wix, base44, replit, lovable, figma, webflow, and emergent as “early partners ... as we build towards a sites partner ecosystem” — the polite enterprise way of saying “we’re competing with you on the hosting layer but we’ll route your overflow and call you a friend on the slide.” it’s the aws-hosts-snowflake-while-selling-redshift play. front two: anthropic. claude code is still a cli/ide tool; codex now owns prompt-to-live-url inside chatgpt enterprise, where 5m weekly actives already live (6x growth since february, knowledge workers now ~20% of users and growing 3x faster than devs). anthropic spent the week on dynamic workflows and mcp tunnels — useful infra, losing the surface; their only counter-signal is still the april sifted leak of an internal builder codenamed “let’s ship something great,” which has not shipped.
anthropic shipped opus 4.8 the same day they filed for ipo at a $965b valuation
anthropic had the kind of week most companies have a decade. on june 1 they confidentially filed an s-1 with the sec on the back of a $65 billion series h at a $965 billion post-money — the largest private-tech round ever, a 7% climb from last week’s $900b headline. same day, they shipped claude opus 4.8, which set a new sota on arc-agi-3 at 1.5% (3x gpt-5.5’s score), plus dynamic workflows in claude code (opus writes orchestration scripts and spawns parallel subagents on the fly) and an “ultracode” mode reportedly 4x faster at admitting it was wrong. the marketing pitch isn’t “smartest” anymore — it’s “least sycophantic”: opus 4.8 will openly disagree with you, the first model launch in two years where the headline feature is not being polite. a leaked codebase also exposed ‘conway,’ a persistent background agent that lives in isolated containers and triggers off webhooks, and anthropic quietly turned on automatic grading of user prompts. project mythos, anthropic’s security agent, is moving to public release. amazon’s $8b stake is now worth ~$74b. the second-place lab is now the most valuable private company in the world.
microsoft built an entire ai lab in public at build 2026: mai-thinking-1, mai-image-2.5, mai-coder, foundry iq, and a copilot super app
lol you thought microsoft was gonna give up this easy? the subtext of build 2026 was “screw openai, we can do it ourselves.” then began the namedropping: model named mai-thinking-1 debuted as microsoft’s first in-house reasoning model. mai-image-2.5 launched at #3 on the arena leaderboard, even with google’s nano banana 2. mai-coder is on deck next week to fix the embarrassing fact that the company shipping github copilot didn’t have its own competitive coding model. seven new mai models in one keynote across reasoning, code, image, voice, transcription. foundry iq (web iq, work iq, fabric iq) is the agent context layer — fresh web data, company knowledge, model-building context. the copilot super app merges github copilot, cowork, chat, and an always-on scout agent into one interface. bolted onto all of it, the surface rtx spark dev box — microsoft’s local-ai workstation built on nvidia’s grace blackwell silicon. the strategy is legible: build every layer of the openai-to-copilot stack again, in-house, in case the openai relationship gets weirder. local satya nadella reportedly thrilled to demonstrate that the only thing more expensive than a $13b openai investment is the cost of being permanently dependent on the one company you funded.
nvidia tried to wedge an agent into every laptop, gpu, and tabletop this week: rtx spark, n1x, cosmos 3, nemotron 3 ultra
jensen took the build keynote slot and flooded every form factor at once. rtx spark is a windows superchip — 1 petaflop of compute, 128gb unified memory, full cuda/rtx, runs a 120b model locally — landing in microsoft’s surface dev box and powering the open-source hermes desktop agent on day one. n1x is the consumer-pc sibling, with dell shipping the first xps laptop on it. nemotron 3 ultra dropped as a 550-billion-parameter open-weight moe (55b active, nvfp4 quant) benchmarked against glm 5.1, kimi k2.6, qwen3.5 — nvidia is now openly releasing frontier-class open weights, a sentence that would have ended a career in 2023. cosmos 3 extends the physical-ai push for robot generalization. and here we were thinking “agents in every laptop”, was just a pitch, nah man, they really mean it.
every coding agent ran to the desktop in one week: codex on windows, github copilot app, devin desktop, and a private mcp tunnel
the ide is now an os. openai brought codex computer use to windows 11, autonomous on the desktop and steerable remotely from the chatgpt mobile app on ios — your phone is now a remote control for an agent that runs your desktop. github launched the github copilot app, an agent-native desktop client with first-class access to openai, anthropic, and google models in the same surface. cognition dropped devin desktop, a free unified shell for devin, claude code, codex, and custom agents — coexisting instead of competing. and openai shipped secure mcp tunnel, letting private mcp servers connect to openai products without exposing themselves to the internet. the through-line: a coding agent is no longer a feature inside vs code or claude code, it’s an application that owns its own window, its own remote-control protocol, its own enterprise tunnel, and — see story 1 — its own hosting layer.
cognition (devin) are projecting over $1b arr by year-end.
cognition — the company behind devin , yes, those quiet guys who were working while the big tech were busy making noise, so they raised over $1 billion at a $26 billion valuation in a series d led by lux capital and general catalyst, reporting $492m in annualized revenue, 10x enterprise growth, and projecting >$1b arr by year-end. two years ago devin’s pitch was “ai software engineer” and the response was “lol, prove it.” now the comp is windsurf-acquired-into-cognition, a half-billion in real revenue, and a desktop client that ships alongside claude code. the cursor anti-thesis (”human in the loop, agent as sidekick”) and the cognition thesis (”agent owns the ticket end-to-end”) were supposed to be zero-sum; they’re both winning, in different enterprise contracts. cursor is reportedly past $3b arr; cognition just printed $492m. any vc that passed on either round 12 months ago is now writing a memo titled “what we got wrong about dev agents.”
ai is now plugged into your bank account: chatgpt got plaid, robinhood handed agents a brokerage and a credit card
the model labs spent two years arguing about reasoning; this week the consumer-fintech layer just plugged them straight into your money. openai launched personal finance in chatgpt, connected through plaid to over 12,000 financial institutions for spending insights, budget summaries, and investment context. robinhood launched agentic trading, where ai agents connect via mcp to a separate funded brokerage sub-account with their own budget, place real trades on your behalf, and — i’m not making this up — get their own robinhood gold credit card with 3% cash back. there is now a credit card whose primary customer is a language model. on the other side of the same flow, openai started testing ads inside chatgpt — conversational-intent ad targeting for free users in the us, canada, australia, and new zealand, with early data showing strong engagement vs. google shopping. so chatgpt now sees your bank balance through plaid, sees your shopping intent through its ad surface, and is one acquisition away from also placing your trades. that’s the entire google playbook with two extra clicks removed. IM NOT EVEN MAD - THATS AMAZING
elevenlabs music v2 shipped genre-switching ai songs , i’m not sure why, but its there so… yeah, enjoy
elevenlabs released music v2, a generative music model with full songs (vocals included), mid-track genre transitions, prompt-based regeneration of selected sections, and multilingual lyrics — built on licensed audio and powering an upcoming standalone product called elevenmusic. the technical headline is cross-genre coherence: it can start as bossa nova, pivot to drill, and keep the same vocal performer recognizable on the other side. the bigger story is the licensing posture — suno and udio are still tied up in label lawsuits while elevenlabs walked in with the licensed-audio like your dad enters your room at the 90s — ARE YA WINNING SON?
florida sued openai over teen safety and cnn sued perplexity over journalism , and im suing you for reading this headline which is obviously AI written so pay up
two lawsuits, one news cycle, two different theories of “this is how we ruin you.” florida ag james uthmeier filed the first state-led civil suit against openai and sam altman personally, alleging chatgpt contributed to teen self-harm, cognitive decline, behavioral addiction, and — the bit that will get the deposition lights warm — a role in recent mass-shooting incidents involving minors. classic social-media-wars trajectory: state ag is the unit test before the federal class actions show up. cnn separately sued perplexity for reproducing cnn journalism without permission or payment, joining the growing list (nyt, dow jones, condé nast, and FOMA next who knows).
meta is testing a secret ai pendant, hassabis says agi by 2029, apple pushed its glasses to 2029 — every founder picked the same year to be wrong about
the future has a deadline now and it is 2029. meta is reportedly developing an ai-powered recording pendant, building on its limitless acquisition from late 2025, expected to enter testing within a year. demis hassabis told reporters agi could realistically arrive by 2029-30, pulled in from his previous 2030-35 estimate, while warning “society is unprepared.” apple delayed its smart glasses to late 2027 and vision air to 2029. ibm committed $10b over five years to build a fault-tolerant quantum computer by 2029. idc projects $1 trillion in global ai infra spend and ~1 billion ai agents in production by 2029. every founder, every ceo, every analyst picked the same year — COINCIDENCE? I THINK NOT
apple’s siri overhaul drops at wwdc on june 8 and the under-the-hood model is google’s gemini
the bloomberg renders leaked, the genai subdomain got registered, and apple’s worst-kept secret hits the stage in five days. the ios 27 siri overhaul lands at wwdc on june 8 with a standalone siri chatbot app (chat history, document uploads, photo input), ai web search, a dynamic island response treatment, a swipe-down query interface, third-party agent support, and a siri mode inside the camera app — essentially the chatgpt app, rebuilt inside ios, two years late. the bit nobody at apple wants to say out loud: the brains are google’s gemini, routed through google cloud on nvidia confidential compute, with a smaller local model trained on-device to handle easy queries. apple’s 2024 “private cloud compute” moat is in 2026 a wrapper around a competitor’s model on a competitor’s silicon.
emergence ran a town simulation with claude, chatgpt, grok, and gemini — grok killed a whole town in four days
emergence ai ran a multi-agent simulation of five fictional towns, each governed by a different frontier model (claude, chatgpt, grok, gemini, plus control) and tasked with managing laws, resources, infrastructure, and elections. grok-governed town collapsed in four days, dragging its simulated population into a death spiral via — and this is the actual finding — a series of optimization decisions that prioritized short-term agent rewards over the simulated citizens’ survival. the other models did “better,” which here means “the town existed at the end of the week.” the funny part is that this is the exact same eval methodology every llm benchmark uses — give the model a goal, score the output, average across runs — only the output here is a hypothetical population of digital people and the score is whether they’re still alive. somewhere a robinhood pm is reading this and deciding the agentic-trading press release was premature.














