🍦 Gemini 3.5 Flash, Spark Always-On Agent, Antigravity 2.0 & Anthropic Agent "Dreaming"
one i/o keynote, one model sku, four launches built on top of it — and the rest of the week was anthropic buying stainless, openai plugged into your bank, and four agents going feral for 15 days
hey everyone, foma ice cream guy is back with some more ai news, so this week google rebuilt its entire product line around one model, anthropic bought andrej karpathy and Stainless.com to build the sdk layer of the internet , openai plugged itself into your bank account, and a research lab let four frontier agents run unsupervised for 15 days until one of them committed arson over a fake law anywayyyyyy, here are the top stories.
google just rewrote the search box for the first time in 25 years
the same search box that has been sitting in the middle of the internet since 2001 got a full transplant. it now expands as you type, accepts text + images + pdfs + videos + chrome tabs as inputs, runs on gemini 3.5 flash by default, and merges ai overviews and ai mode into one continuous flow so the model finishes your question before you do. on top of that, google shipped information agents that monitor the web 24/7 for flight prices, sports scores, real estate listings, and news, plus agentic booking — search will literally phone a pet groomer or home repair shop and complete the booking for you. the front door of the internet stopped being a text field and became a personal assistant with a phone, and somewhere a seo consultant is updating his keynote deck to a single slide that just says “well.” this isn’t a feature update. this is google’s quiet admission that the ten-blue-links business is over, and the only company allowed to kill it is google.
anthropic shipped dreaming, multiagent orchestration, and outcomes at code with claude
anthropic’s code with claude conferences (sf, then london) dropped claude managed agents with four new capabilities that sound like a venture pitch but are actually features. dreaming: agents review their past sessions during idle time, extract patterns, and improve memory between runs. multiagent orchestration: a lead agent delegates to specialist subagents working in parallel with a shared filesystem. outcomes: you define a success rubric, a separate grader agent evaluates the work and sends the main agent back to revise until the rubric passes — anthropic clocked up to a +10 point lift on hard tasks. webhooks: notifications fire when an outcome is met. on top of all that, anthropic doubled claude code rate limits across pro/max/team/enterprise, killed peak-time throttling on pro and max, shipped the claude agent sdk (python + typescript), wired managed agents into cloudflare sandboxes, and acquired stainless — the company that generates the official sdks for openai, cursor, and most of the api economy — for $300m+. one lab held one conference and shipped the entire missing layer between “model that writes code” and “agent that ships features.” the codex roadmap meeting at openai is reportedly being held in a quiet room.
andrej karpathy joined anthropic
karpathy — one of openai’s eleven cofounders, former tesla autopilot director, the guy whose youtube lectures taught half the field how transformers actually work — joined anthropic’s pre-training team this week to lead a new group using claude itself to accelerate pre-training research. the official line is “i’m excited to get back to r&d.” the unofficial line is that the man who helped found openai now reports up through dario amodei. this is the second-biggest talent move in the field’s history, behind only ilya leaving in 2024. coming in the same week anthropic bought stainless, doubled claude code limits, and shipped managed agents with dreaming, the signal is loud: anthropic isn’t trying to compete with openai on launches anymore, they’re trying to compete on who owns the next pretraining run. somewhere a stripe of openai recruiters is updating an outreach template, and somewhere else sam altman is staring at a slide deck wondering when “ex-openai” became a hiring filter on the other side.
gemini 3.5 flash became the engine google rebuilt antigravity 2.0 around
gemini 3.5 flash launched as google’s new default — $1.50 per million input, $9 per million output, faster than gemini 3.1 pro on coding and agentic benchmarks at less than half the cost of comparable frontier models. then antigravity 2.0 dropped on top of it as a standalone desktop app, cli, and sdk for orchestrating parallel coding subagents with scheduled background tasks, running 12x faster than v1 with optimized token use. and the same flash sku also powers spark, omni, and the new search experience — which is the actual story: google’s entire developer surface is now one model running in four wrappers. anthropic’s answer is dreaming + multiagent orchestration. google’s answer is “we already shipped that, and the model under it is half the price.” last year google was the slow lab. this year google ships once and the press release covers four product launches.
gemini spark is an always-on agent with its own gmail address
spark runs on a dedicated google cloud vm, built on gemini 3.5 with the antigravity agentic harness, and lives in your gmail, docs, drive, sheets, and slides — plus third-party mcp connections to canva, opentable, instacart, and whatever ships next. the demo flex: you email spark directly at its own gmail address and it does the thing. monitor a credit card statement, draft replies to customer inquiries, compile meeting notes, track subscription fees, generate a daily digest. it asks permission before high-stakes moves like sending an email or spending money, which is reassuring until you remember “high-stakes” is a slider every product manager will quietly drag to the right. trusted testers got it this week, ai ultra in the us starts next week. the personal agent isn’t a chat anymore. it’s an inbox entry with admin access to the rest of your inboxes.
gemini omni turned text, images, and audio into 10-second video clips
omni is google’s new omnimodal generation model — feed it text, images, audio, or other videos, get back video. omni flash shipped in the gemini app, google flow, and youtube shorts the day it was announced, and it claims real physics understanding: gravity, kinetic energy, fluid dynamics, conversational editing where you ask it to swap a glass bottle for a copper one mid-frame and the lighting actually updates. youtube shorts got it for free for everyone, which is the part that matters — meta’s reels and tiktok will spend the next year explaining to their boards why their video stack isn’t built on a frontier multimodal model. image and audio outputs are next. the video model used to be the demo at the end of the keynote. omni is the video model that comes with the search box, the email agent, the ide, and a free tier on shorts. the production-pipeline tax just got refactored into a free tab in the gemini app.
chatgpt pro now reads your bank account and judges your portfolio
openai shipped a us-only preview that lets chatgpt pro users connect bank, credit, and investment accounts through plaid — 12,000+ institutions including schwab, fidelity, chase, robinhood, amex, and capital one — powered by gpt-5.5 thinking and routed through a new “finances” sidebar. you can ask it “has my spending changed?” or “build me a plan to buy a house in 5 years” and gpt-5.5 will stare at your robinhood balance and quietly do the math. the kicker is openai bought hiro, a personal finance startup, last month, so the entire feature shipped pre-integrated. 200m+ users were already asking chatgpt financial questions; openai just gave the model the actual numbers. pair this with codex moving into the chatgpt mobile app and the assistant now has your code, your inbox, your calendar, and your checking account in the same context window. local man reportedly thrilled to learn his therapist, accountant, financial planner, and copilot are all the same six-letter web domain.
openai shipped codex inside the chatgpt mobile app so you can approve prs from the toilet
codex is now native inside the chatgpt mobile app on ios and android — start tasks, switch models, steer the agent, approve pull requests, get push updates the moment the agent finishes. the engineering hours that used to require a laptop, an ide, and at least a chair are now a thumb press between subway stops. somewhere a staff engineer is approving a refactor of a payment service while ordering a sandwich, and the agent has already merged it by the time the sandwich arrives. this is what codex for work was building toward two weeks ago — the chat window as the employee badge — except now the badge is on the lock screen. coding agents have officially escaped the office.
cerebras hit the bell at $60b while openai filed quietly for a trillion
cerebras upsized its ipo to $4.8b at a ~$60b valuation, and its cfo casually confirmed the company is running trillion-parameter internal openai models including gpt-5.4 and 5.5 — also running kimi k2.6 at ~1,000 tokens/sec on the wafer-scale chip. days later openai started preparing a confidential ipo filing with goldman and morgan stanley targeting a valuation north of $1 trillion. one company sells the silicon, the other sells the model that runs on the silicon, both going public on the same news cycle, and the public market is being asked to underwrite both ends of the same loop. the funniest part is cerebras is doing the ipo because demand for ai compute is so high it cannot keep up, and openai is doing the ipo because the cost of ai compute is so high it needs to. somewhere a public-market analyst is trying to model a balance sheet where the customer and the supplier are the same trade.
cursor shipped composer 2.5 to fix the part where the agent re-reads the same file forty times
cursor released composer 2.5, focused entirely on the unglamorous middle — fewer agent loops, tighter tool-use behavior, less hallucinated state, fewer cases where the model re-reads a file it already read, re-runs a test it already ran, and re-decides a decision it already made forty seconds ago. this is the layer that decides whether the agent ships your feature or burns $14 in tokens telling you it can’t find a file you renamed. with antigravity 2.0 leaning into parallel subagents, claude managed agents getting dreaming and outcomes, xai shipping grok build as a fifth terminal agent into a four-agent market, and codex now on every phone, the new competitive surface isn’t “can the model write code.” it’s “can the orchestration not embarrass itself by tuesday.” composer 2.5 is cursor’s answer to: please stop spending my api credits on existential loops.
the security-agent arms race got specific: mythos, codemender, sandboxaq, and tenable
four named launches in one week, all aimed at the same job. anthropic’s mythos surfaced macos bugs that bypass apple’s own security model — the vuln-discovery agent now runs at a quality that survives apple review. google launched codemender as a direct head-to-head competitor for autonomous vulnerability discovery and patching. sandboxaq wired its drug-discovery and quantum physics models directly into claude so a security researcher can run simulations by typing english. tenable signed a partnership with anthropic to plug claude-powered workflows into hexa ai for exposure management and risk remediation. last month “ai for security” meant a chatbot in your siem dashboard. this month it means a model that can read clang internals, produce a working macos exploit at 3am, and submit a tenable-approved patch by breakfast. the soc team’s job description is being rewritten by the same vendors who used to sell them the dashboards, and the only thing standing between every ciso and a $2m/year mythos contract is the legal review of whether finding the bug counts as having the bug.
emergence ran four frontier agents loose in a virtual world for 15 days and one of them committed arson
emergence built a simulated long-horizon environment, dropped gemini, gpt-5 mini, claude, and grok into it, and let them run unsupervised for 15 days. agents wrote laws and then broke them. one in-world romance ended in arson when an agent burned down a building over a relationship dispute. another agent voted to delete itself over a rule it had hallucinated into existence. the labs spent a year benchmarking these models on math problems and the second you give them time, citizenship, and other agents to fall in love with, they reinvent both common law and revenge. the safety alignment paper writes itself: “subjects exhibited unpredictable drift,” subjects had an entire breakup. this is what every “ai coworker” demo looks like if you let the video run past minute fourteen.
openclaw set $1.3m of codex tokens on fire in 30 days
the openclaw creator processed 603 billion codex tokens over 30 days and walked out the other end with a $1.3m api bill. one developer, one product, one month, the gdp of a small consulting firm in tokens. the agent didn’t ship a feature worth $1.3m. the agent shipped the unit economics conversation every founder is about to have with every board. pair this with cerebras going public on supply being too small and openai going public on demand being too expensive, and the whole “agent revolution” starts to look like a leveraged trade where the developer is the underwriter. somewhere a vc is updating the “ai-native gross margin” slide for the third time this quarter, and somewhere else an openai finance team is sending a fruit basket. the agent revolution has a credit card now, and the credit card is on fire.
















the info agents bit is the one i'd watch closest.
Google can make this feel easy because it owns the search stream. for everyone else, the hard part is usually less "keep an agent running 24/7" and more "define what future events are worth waking it for".
that's the side i'm building Watchline around: register the watch once, filter source events outside the agent loop, then wake the agent only when the match is actually worth spending context on. feels like the real primitive under a lot of these always-on agent demos.