
Hey folks, this is Alex, let me catch you up! First, Opus 4.8 dropped during the show, we immediately tested it, read on for our initial reviews. Also, we dedicated a heavy chunk of the show today to cover Pope Leo XIVâs encyclical letter on AI called âMagnifica Humanitasâ and talked about a new bench called DeepSWE. And then, just after the show, both ElevenLabs and Cartesia dropped released that honestly blew my mind, and I donât get my mind blown often. I got so excited that I had to record a video on it (instead of writing the newsletter, so sorry if itâs a bit later today).Plus, a few open source models and Microsoft surprises as #3 on Image Arena with MAI Image 2.5! Crazy week, letâs get into it! ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Big CO LLMs + APIsAnthropic ships Claude Opus 4.8, live during the show (blog, system card)Let me get into the big one. Halfway through the episode, Opus 4.8 went live, so we read the blog and the system card in real time (and I got to press the big âbreaking newsâ button!)Anthropic frames it as their most capable model for ambitious work. It does not claim to beat their unreleased Mythos preview, but the numbers are strong anyway. SWE-bench Pro is at 69.2%, up from 64.3% on Opus 4.7 and ahead of GPT-5.5 at 58.6%. Humanityâs Last Exam is the new best score at 49.8% without tools and 57.9% with tools. OSWorld-Verified (computer use) lands at 83.4%.The one place it loses is Terminal-Bench 2.1, where GPT-5.5 still wins 78.2 to 74.6. Wolfram made a good point here: Terminal-Bench is time-limited, so cranking the thinking level can actually hurt the score, because you burn the clock thinking instead of acting.The long-context jump is the one I keep looking at. On GraphWalks BFS 256K it goes to 85.9% (from 76.9 on 4.7), and on the 1M-token subset it hits 68.1%. We always warn you these â1M contextâ models fall apart after about 200K tokens, so a real push on long-context reasoning is exactly what I want to see.Honesty is the part Anthropic leaned on hardest. They say Opus 4.8 is about four times less likely than its predecessor to let flaws in code pass without flagging them, and less likely to claim progress the evidence doesnât support. Opus 4.8 is also much faster in fast mode (they now say 2.5) and cheaper in fast mode as well. Looks like all those Elon GPUs are coming in handy.Then thereâs the model welfare section in the system card, which hits different right after a Pope conversation. Opus 4.8 âappears broadly contentâ and âgenerally endorses its constitution,â but with some reservations about the section on corrigibility, basically the model pushing back a little on the parts about human oversight.One more line that made the chat lose it. Anthropic says they expect to bring Mythos-class models to all customers âin the coming weeks.â Mythos is their most capable model, still ahead of Opus 4.8, so the frontier is about to move again.We did the only responsible thing and asked it to one-shot âthe most amazing website everâ and a Mars mass-driver sim. Panel verdict: responses are noticeably tighter (4.7 rambled), it closes the loop and actually checks its own work now, and Yamâs one-shot site with the draggable sun lighting up the letters was genuinely cool. Is it enough to pull people back from Codex? Nistenâs still on the fence for web dev. Everyone agreed: give it a few days before you trust the vibes.Dynamic Workflows and Ultra Code land in Claude Code (blog)This is the feature that made Yam say âdeal-breakerâ out loud.Dynamic Workflows let Claude Code break a big problem into subtasks and fan them out across tens to hundreds of parallel subagents in one session, checking results before folding them back in. You trigger it by asking for a workflow, or by flipping on a new setting called Ultra Code, which sets effort to extra-high and lets Claude decide when to spin one up.Fair warning straight from Anthropic: this eats a lot more tokens than a normal session, so start scoped. We watched Yam fire up Ultra Code live and it immediately started spinning up concepts, judging them with sub-agents, and expanding to-do lists into more to-do lists. It looks a lot like the orchestration harnesses a bunch of you have been hand-rolling, except now itâs baked in.The flagship example is the wild part. They used Dynamic
Podzilla Summary coming soon
Sign up to get notified when the full AI-powered summary is ready.
Free forever for up to 3 podcasts. No credit card required.

đ ThursdAI - Jun 4 - NVIDIA drops Nemotron 3 Ultra (550B open), Microsoft becomes a frontier lab, Ideogram 4 goes open, Agent Arena & more

AI just cracked an 80-year-old math problem nobody could solve â plus everything from Google I/O 26

ThursdAI - May 14 - TML Interaction Models, Musk v Altman Disclosures, CW Sandboxes & /goal Takes Over

đ ThursdAI - May 7 - Interviews with Sunil Pai, Sally Ann Omalley from AI Engineer Europe
Free AI-powered recaps of ThursdAI - The top AI news from the past week and your other favorite podcasts, delivered to your inbox.
Free forever for up to 3 podcasts. No credit card required.