
Free Daily Podcast Summary
by From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.
The most recent episodes — sign up to get AI-powered summaries of each one.
Hey folks, Alex here, let me catch you up! I’ve had a feeling that this week is going to be crazy, as it started on the weekend MiniMax M3, then with Jensen announcing new RTX Spark, NVIDIA’s first PC chip packing 1 petaflop of local AI power into thin laptops.A few days later at Microsoft BUILD, Satya & Mustafa from MAI dropped 7 AI models, completely pre-trained from scratch, including a new MAI-thinking-1, MAI-code and MAI-image 2.5 that started topping the image gen charts. Then other image models started racing to the top of the Arena benchmarks, IdeoGram 4 hitting becoming SOTA open weights image-gen model, and Reve 2 beating Nano Banana just a few hours after that. And then today, NVIDIA dropped Nemotron 3 Ultra, their latest 550B open weights model, data and training and Arena published a new agentic eval leaderboard and we got a new Gemma 4 12B. I’ve had the great pleasure to host Chris (@llm_wizard) from Nvidia, Peter Gostev from Arena and Karan from Nous Research (who were featured prominently by Jensen!) all on the show. Def don’t miss this one! Let’s get into the details. ThursdAI - Join the flock of folks who know what is happening in AI before everyone else.Open Source LLMs 🔥 NVIDIA Nemotron 3 Ultra: The 550B Open Source Beast Built for Agents (X, Arxiv, Announcement)This was the big one. Breaking news mid-show: NVIDIA drops Nemotron 3 Ultra, a 550 billion parameter sparse MoE model with 55 billion active parameters, built on a hybrid Mamba-Transformer architecture. Chris Alexiuk, AKA Joe Nemotron, joined us live from NVIDIA HQ in Santa Clara to walk us through it.The headline number is 5.9x higher inference throughput compared to GLM-5.1 on decode-heavy workloads. Chris told us that this is a result of multiple things, their Hybrid Mamba-Transformer approach, the sparse attention, and that they optimized for decode-heavy workloads (the kinds of workloads agents do)The architecture is fascinating. They’re mixing Mamba-2 state space layers with sparse attention, which means step 300 in an agent loop runs as fast as step 3. Pure transformers can’t do that because the attention cost keeps growing with context length. This kicks in big time at 64K+ sequence lengths, which is exactly where you end up in real agentic work when the model is having multi-turn conversations and people are dumping their entire codebase in.P.S - We launched Nemotron 3 Ultra with 0-day support on CoreWeave Inference, it’s super fast and pretty cheap, give it a try hereThey pretrained on 20 trillion tokens, extended context to 1 million tokens, and their post-training pipeline used multi-teacher on-policy distillation from over 10 specialized teacher models covering everything from SWE to terminal use to search to office work, which they are also going to open source soon!One thing Chris emphasized that I really appreciate: NVIDIA doesn’t have their own harness. There’s no “NVIDIA Code.” Which means they actively resist the temptation to harness-max, to optimize for just one harness and look good on a specific leaderboard. Ultra should be a solid drop-in for whatever harness you’re used to, and that generality is worth a lot. It’s not the best thinker, but it is the highest score US based open weights model, so again, a huge huge win for the US AI ecosystem!The Nemotron 3 Ultra release is open under the OpenMDW-1.1 license: base BF16, post-trained BF16, and NVFP4 quantized checkpoints, plus the GenRM, synthetic pre-training data for code, legal, and <a target="_blank" href="https://huggi
Hey folks, this is Alex, let me catch you up! First, Opus 4.8 dropped during the show, we immediately tested it, read on for our initial reviews. Also, we dedicated a heavy chunk of the show today to cover Pope Leo XIV’s encyclical letter on AI called “Magnifica Humanitas” and talked about a new bench called DeepSWE. And then, just after the show, both ElevenLabs and Cartesia dropped released that honestly blew my mind, and I don’t get my mind blown often. I got so excited that I had to record a video on it (instead of writing the newsletter, so sorry if it’s a bit later today).Plus, a few open source models and Microsoft surprises as #3 on Image Arena with MAI Image 2.5! Crazy week, let’s get into it! ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Big CO LLMs + APIsAnthropic ships Claude Opus 4.8, live during the show (blog, system card)Let me get into the big one. Halfway through the episode, Opus 4.8 went live, so we read the blog and the system card in real time (and I got to press the big “breaking news” button!)Anthropic frames it as their most capable model for ambitious work. It does not claim to beat their unreleased Mythos preview, but the numbers are strong anyway. SWE-bench Pro is at 69.2%, up from 64.3% on Opus 4.7 and ahead of GPT-5.5 at 58.6%. Humanity’s Last Exam is the new best score at 49.8% without tools and 57.9% with tools. OSWorld-Verified (computer use) lands at 83.4%.The one place it loses is Terminal-Bench 2.1, where GPT-5.5 still wins 78.2 to 74.6. Wolfram made a good point here: Terminal-Bench is time-limited, so cranking the thinking level can actually hurt the score, because you burn the clock thinking instead of acting.The long-context jump is the one I keep looking at. On GraphWalks BFS 256K it goes to 85.9% (from 76.9 on 4.7), and on the 1M-token subset it hits 68.1%. We always warn you these “1M context” models fall apart after about 200K tokens, so a real push on long-context reasoning is exactly what I want to see.Honesty is the part Anthropic leaned on hardest. They say Opus 4.8 is about four times less likely than its predecessor to let flaws in code pass without flagging them, and less likely to claim progress the evidence doesn’t support. Opus 4.8 is also much faster in fast mode (they now say 2.5) and cheaper in fast mode as well. Looks like all those Elon GPUs are coming in handy.Then there’s the model welfare section in the system card, which hits different right after a Pope conversation. Opus 4.8 “appears broadly content” and “generally endorses its constitution,” but with some reservations about the section on corrigibility, basically the model pushing back a little on the parts about human oversight.One more line that made the chat lose it. Anthropic says they expect to bring Mythos-class models to all customers “in the coming weeks.” Mythos is their most capable model, still ahead of Opus 4.8, so the frontier is about to move again.We did the only responsible thing and asked it to one-shot “the most amazing website ever” and a Mars mass-driver sim. Panel verdict: responses are noticeably tighter (4.7 rambled), it closes the loop and actually checks its own work now, and Yam’s one-shot site with the draggable sun lighting up the letters was genuinely cool. Is it enough to pull people back from Codex? Nisten’s still on the fence for web dev. Everyone agreed: give it a few days before you trust the vibes.Dynamic Workflows and Ultra Code land in Claude Code (blog)This is the feature that made Yam say “deal-breaker” out loud.Dynamic Workflows let Claude Code break a big problem into subtasks and fan them out across tens to hundreds of parallel subagents in one session, checking results before folding them back in. You trigger it by asking for a workflow, or by flipping on a new setting called Ultra Code, which sets effort to extra-high and lets Claude decide when to spin one up.Fair warning straight from Anthropic: this eats a lot more tokens than a normal session, so start scoped. We watched Yam fire up Ultra Code live and it immediately started spinning up concepts, judging them with sub-agents, and expanding to-do lists into more to-do lists. It looks a lot like the orchestration harnesses a bunch of you have been hand-rolling, except now it’s baked in.The flagship example is the wild part. They used Dynamic
Hey, Alex here, just got back from the sunny Shoreline Theater in Mountain view, so let me catch you up! This week was definitely Google heavy, we are covering Google’s IO conference for the third year in a row, and today we have a special guest, Logan Kilpatrick, is joining to discuss the announced Gemini 3.5 Flash, Google Omni model, and the new Managed Agents offerings. Plus, this week, for the first time, OpenAI announced that AI solved a Math problem that humans couldn’t solve for 80 years, Cursor is showing off Composer 2.5 which is partly trained on XAI data, Karpathy joins Anthropic and much more! Let’s dive in! P.S - We’ve announced our upcoming hackathon, Weavehacks-4, June 6-7, I’ll be there, we’re expecting the seats to run out very soon so register nowThursdAI - We’d love to have your subscription, and if you’re already subscribed, please hit that bell on YT to never miss an episode!Google I/O 2026 - Google goes agentic everywhereI went to cover Google I/O for the third year in a row, shoutout to the DeepMind team for inviting ThursdAI again, and folks, this one felt different.Last year, Google I/O was still very model-centric. This year, the story was not “here is another benchmark chart.” The story was: Google is putting Gemini into everything, and the agentic layer is becoming the product layer. Search, Gemini app, Android, Workspace, YouTube, AI Studio, Cloud, Antigravity, Flow, managed agents, smart glasses, all of it is now orbiting around one pretty clear strategy: Gemini is the intelligence, Antigravity is the agent harness, Google’s products are the distribution. I saw many reactions that were milquetoast, as in, “we expected more” and those seem to dominate the X feed. But I think the distribution is the part that many folks on X are missing. Yes, we can argue about Gemini 3.5 Flash pricing. Yes, we can argue whether “Flash” still means what Flash used to mean. But when Google says the Gemini app itself has 900 million monthly active users, before even counting Search, Gmail, YouTube, Docs, Drive, Android, and the rest of the Google surface area, that’s massive! OpenAI ChatGPT is supposedly stagnated at ~900M, I don’t remember them crossing a 1B. Meanwhile Google is gaining traction. And they just updated all those folks with a new model!Wolfram said it really well on the show: his mother is not sitting there reading model cards. She just uses her Pixel, voice unlocks Gemini, asks for help, and suddenly the default intelligence available to her goes up. Antigravity 2.0 - the agent harness takes center stageThe biggest strategic signal from Google I/O for me was Antigravity.Remember, Antigravity was an IDE that came from the Windsurf acquisition saga. Part of the Windsurf team went to Google, part went to Cognition, and now Google is very clearly putting Antigravity in the middle of its agentic future. And I mean very clearly. Sundar mentioned it. Demis mentioned it. Varun Mohan the co-founder was on stage immediately after them! If you’ve ever watched a Google I/O keynote, you know how carefully every minute is allocated. Google has YouTube, Search, Gmail, Android, Cloud, Ads, Workspace, and a thousand VP-level products that could be on stage. The fact that Antigravity was that prominent should tell you everything.Logan Kilpatrick joined us and framed this in a way I loved: Gemini became the through-line across Google products, and now the Antigravity agent harness is becoming the through-line for agentic experiences.The new Antigravity 2.0 is a complete overhaul, showing only an agentic interface (which was previously just a separate window called Agent Manager) and separating the IDE layer completely into its own app and showing a Codex like agent-first interface, which got a few folks furious. This move may be weird to some folks, but if you follow along where everyone’s going, this seems to be the way of the future, coding is no longer about lines of code, it’s about managing fleets of agents. The new Gemini 3.5 absolutely shines inside the new Antigravity, the model was trained with this harness in mind, and is currently offered at an incredible speed (12x), so I’m definitely going to try it! Gemini 3.5 Flash - fast, determined, and maybe not the old “Flash”The most debated model release of the week was Gemini 3.5 Flash.Some folks saw the pricing and token
Hey everyone, Alex here 👋I am back live on ThursdAI after a week off, and yes, I am now a married man! Thank you for all the congrats, and also thank you to Ryan and Yam for holding down the fort last week while I tried very hard to disconnect.This week was a relatively chill one in AI land (no, really, for once), which actually let us go deep on some really fascinating stuff. We’ve got Thinking Machines Lab finally shipping their first real research with these wild interaction models, Meta Muse Spark showing up in actual products (and it’s surprisingly good!), the Musk v. Altman trial dropping juicy disclosures, and probably the biggest narrative shift on the show today: all of us are quitting OpenClaw. Yeah, you read that right. We’ll get into why.Also! and this is breaking news from this morning, CoreWeave just launched Sandboxes for your agents. I’ll cover that in This Week’s Buzz, but if you’ve been waiting for production-grade sandbox infrastructure that powers 9 out of 10 major AI labs, today’s your day.Oh, and we had Vic Perez from Krea on to talk about Krea 2, their first foundation image model trained completely from scratch. Let’s dig in.ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.The Great OpenClaw Exodus towards Hermes 🫠I’m going to start with what was honestly the most emotional thread of the entire show, because three of us, me, Ryan, AND Wolfram; all independently switched away from OpenClaw this week. And we kicked off the show literally processing this together on air.The story is the same across all of us. OpenClaw was magical back in February when we first brought it to you. Things just worked. But after Anthropic’s pricing changes (we covered this — they made Max-tier subscription usage of Opus through OpenClaw significantly more expensive), and after months of the constant Lego-construction-style breakage on every update, the magic faded. Ryan said it best on the show; he was “constantly fixing OpenClaw” instead of using it.So Ryan went to Codex. Wolfram and I both went to Hermes from Nous Research. And folks, things just work again. That February feeling is back, and with GPT 5.5, it’s an incredible assistant!Why Hermes? A few things:* It’s now the #1 most-used CLI agent on OpenRouter globally, passing OpenClaw and even passing Claude Code on OpenRouter usage. That’s a massive milestone for Nous Research and shows we’re not alone in this migration.* It has /goal (more on this in a sec), steering, and background computer use via the TryCUA integration.* It’s open! which means if you’ve built a system like Wolfram’s “Amy” or my “Wooolfred” or Ryan’s “R2” (yes, we know each other’s assistants’ names better than each other’s kids’ names at this point 😅), you can port your memories, profile, and soul files seamlessly.The migration was so smooth that Wolfram literally had Codex talk to Hermes to plan and execute the migration of his home assistant agent. Two agents collaborating to migrate themselves. We are living in 2026 and it’s easier than ever to switch. If you haven’t tried Hermes, give it a go! Steering is maybe the most underrated addition to Hermes, it’s a Codex feature, but exists in Hermes, with GPT 5.5 you can send a follow-up message, and the agent will see it after the next tool call, not after the whole chain of thought was completed (like OpenClaw defaults to) - this changes the conversation to be much more natural! Agents buying wedding gifts using Stripe wallet! Real quick story: Two weeks ago we covered Stripe’s new wallet APIs that let your agents have actual budgets to spend money on the web. I told my agent (back when it was still OpenClaw) to “go buy us a wedding present, don’t tell me what it is.” It half-worked, half-broke. This week, a giant custom map of our travels that just arrived in the mail. I approved one Stripe push notification and the rest just happened. It’s been paying my traffic tickets via screenshots. I’ve also had Hermes pay traffic tickets for me (HOV lane ones, not like.. DUI, 80% of my drive is Tesla F
Hey yall, Alex here (with a scheduled post) I’m taking this week off to get married and celebrate life with family, and touch some grass, but wanted to share the awesome chats I had with some great folks at AI Engineer Europe last week. BTW - Yam and Ryan took over the live show today, if you didn’t happen to catch that, please check out the live on our youtube channel! Ok, now to the actual content. The best thing about the AI Engineer conferences for me is the people I meet. I often have a chance to bring them to the live show (in fact, the live show we recorded there had the most guests yet on an episode! 4 guests including Swyx, Omar Sanseviero, VB from OpenAI and Peter Gostev) But often times I also have an offline chat. I find these conversation to be less about the weeks news, and more about the state of AI Engineering, and the guests themselves. Not quite Lex Friedman pod level, but a different vibe from our live shows. Sunil Pai - Cloudflare (@threepointone)The first conversation in today’s pod is with Sunil Pai, Principle Engineer at Cloudflare. Long time followers of ThursdAI know that I love Cloudflare, they gave me my first big break when I was building Targum (which still runs on Workers), so I had a great time chatting with Sunil! This guy has had several lives. React.js core team at Meta (he self-deprecates — "I'm the one nobody talks about, there's a testing API I shipped that pisses people off"). Then did developer tooling and the CLI at Cloudflare the first time. Left to found PartyKit — open-source deployment platform for real-time multiplayer apps and AI agents, built on Cloudflare Durable Objects. Backed by Sequoia. Acquired by Cloudflare in 2024, and he came back as a Principal Systems Engineer (per his bio: "Worked at Cloudflare once, left and created PartyKit, came back wiser"). Also plays guitar (Les Pauls — it's all over his blog). Co-hosts a live show called Dry Run on Cloudflare TV with Craig Dennis.Our conversation was a very fun one, ranging from Cloudflare agentic offerings, to how engineers should think about writing/reading code in 2026. I had a great time chatting with Sunil and I hope you enjoy getting to know him!Sally Ann O'Malley - RedhatThen I had the pleasure of chatting with Sally, who’s a Principal Engineer at Redhat and contributor to OpenClaw. Sally has one of the more unusual paths in the speaker lineup. Started as a schoolteacher, did a stint at Trader Joe's, then moved to Westford, MA, discovered Red Hat's HQ across the street, and went back to school for a second bachelor's in software engineering at UMass Lowell. Joined Red Hat in 2015, has been there a decade. Worked across OpenShift teams, integrating Kubernetes and Podman into the platform. Recent projects span Image Based Operating Systems, Podman, OpenTelemetry, and Sigstore. Also an instructor at Boston University's Faculty of Computing and Data Sciences and an organizer for DevConf.US. Won the 2025 Paul Cormier Trailblazer Award at Red Hat. Currently a founding contributor on the llm-d project — distributed, scalable, high-performance AI inferencing built on K8s. Heavily involved in Red Hat's InstructLab collaboration with IBM (the small-model distillation system using IBM Granite + Llama).Sally and I had a great conversation, two high energy personalities met! We geeked out about our OpenClaw agents, securing your Clankers, how it is to maintain OpenClaw, and everything in between! She was so stressed about the recording, but dare I say, this was one of the more natural guests I had on the show! I hope you enjoyed this format, please let me know if the comments, and I’ll see you next week! — Alex This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Hey everyone, Alex here 👋Tomorrow is May. May! I genuinely cannot believe we’re four months into 2026 already, and the AI news cycle is showing zero signs of slowing down. This week’s show was a wild one! We opened with what is genuinely one of the most important AI stories I’ve ever covered (Mayo Clinic AI detecting pancreatic cancer THREE YEARS before human radiologists), we covered the return of the Chinese whale with DeepSeek V4, OpenAI got caught in their own system prompt begging GPT-5.5 to please stop talking about goblins, and I literally gave my coding agent a credit card and asked it to buy my fiancée a wedding gift with the new Strip Link skill and CLI! Oh yeah, I’m getting married next Tuesday! 💍 So next week’s show will be a little different. I’ll be back the week after to catch you up on whatever drops in my absence (almost certainly something major, knowing this industry).Lots to get through, so let’s dive in. (also, in the end I have a full month recap of every major launch, don’t miss) ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Mayo Clinic’s REDMOD: AI Detects Pancreatic Cancer 3 Years Early 🔥 (X, Blog, Announcement)I know we usually cover Models, Parameter sizes, MoEs and big copmanies. But this is important. This is the use case that justifies the entire AI revolution, the GPU burns, the buildouts. I want humans to WIN, and Cancer to be fixed!Mayo Clinic just published a study in Gut (BMJ) validating an AI model called REDMOD that detects pancreatic cancer on routine CT scans up to three years before clinical diagnosis. The numbers are jaw-dropping: They show 73% sensitivity for catching prediagnostic cancers, compared to 39% for experienced human radiologists (while looking at the same exact CT scans).And maybe the most important bit, at scans taken more than 2 years before diagnosis, the AI catches nearly 3x as many cases as specialistsFor context: pancreatic cancer has less than 15% five-year survival specifically because 85% of patients are diagnosed after the disease has already spread. This is the cancer that took Steve Jobs. Imagine if Jobs had access to this AI three years before his diagnosis. That’s the impact we’re talking about.As Dr. Ajit Goenka from Mayo Clinic put it, the greatest barrier to saving lives from pancreatic cancer has been the inability to see the disease when it’s still curable. This AI can now identify the signature of cancer from a normal-appearing pancreas.Even better: it runs on CT scans people are already getting for other reasons. No extra screening protocol, no new imaging required. Just smarter analysis of existing data. The model also showed remarkably stable performance across institutions, imaging systems, and protocols, with 90-92% test-retest concordance over serial scans.Mayo Clinic is now moving this into prospective clinical testing through a study called AI-PACED (Artificial Intelligence for Pancreatic Cancer Early Detection).When we say “lets f*****g go” that’s what we mean. Yeah getting more intelligence is cool, but I want a world without decease! Let’s F*****g go mayo clinic! Agentic Commerce - Giving OpenClaw my credit card - safely! Stripe Link Wallet and Infrastructure CLI (X, Announcement, Blog, Announcement)Ok, give an LLM your credit card, what can go wrong.. right? Well, it’s clear that this, increasingly, is the future of commerce. Agents will be shopping for us, and we need solutions here. Well, this week at Stripe Sessions (Stripe’s annual product lineup conference) just delivered. Link Wallet, is a new ... API? CLI? Skill? Definitely a skill, for your agents, to connect with your Stripe Link (the thing that stores your credit cards safely) and then giving your agent a budget, it can go and make purchases in your behalf. Now the trick here, is, every purchase, you get a notification to approve, and the agent never sees y
Hey, Alex here, I’ll try to catch you up, but it’s one of the more intense weeks in AI in recent memory. Here’s the TL;DR - OpenAI dominates across the board this week! Finally launches “spud”, called it GPT 5.5 (and 5.5 Pro), and it’s SOTA on most things,nearly matching the mysterious Claude Mythos but released and we can actually use it (we tested it extensively). OpenAI also took the crown in image generate with the incredible GPT-image-v2 release, beating Nano Banana 2 and pro by a significant margin, the images are incredible, this model can generate working QR codes and 360 images it’s quite bonkers. Codex was updated with Computer Use (which I told you about last week), in-app browser and a bunch of other tools that match GPT 5.5 intelligence. Meanwhile, Anthropic launched an incredible research preview of Claude Design, finally admitted that Claude was dumb and reset quotas across the board, while breaking the trust of the community with removing Claude code from the pro plan. We’ve also got great open source updates, Kimi K2.6 and Qwen 3.6 27B are both great performers! We were live on the stream for almost 4 hours today waiting for GPT 5.5 and finally got it and tested it live on the show + had Peter Gostev on from Arena who had early access and shared with us his insights. Let’s get into it! ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.OpenAI’s GPT 5.5 is here - SOTA AI intelligence you can actually use (Release Blog)OpenAI finally gave us all access to their latest intelligence boost, GPT 5.5 thinking (and GPT 5.5 Pro). These models take the crown across many benchmarks, including TerminalBench (82.7%), GPDval (84%) and more. You can see the highlited versions on the image above. Though, its not uncommon for OpenAI to do some chart crimes, so @d4m1n created a chart that also showed the full benchmarks, including the ones GPT 5.5 is not beating Opus at, as you can see below, it underperforms on Humanity’s Last Exam, and scaled tool use. But, benchmarks don’t tell the full story. GPT 5.5 uses significantly less tokens, compared to 5.4, about 40% less. It’s also more expensive, but given the lower token usage, it nets out at about ~20% price increase, while being more intelligence and faster. Tons of folks who had early access are reporting the same things, this model excels in long running tasks, Peter Gostev from Arena, who joined our live stream, showed us an incredible demo that ran overnight for over 8h! This model can work until the task is done, no longer just pausing in the middel asking for your input. The real highlight is, paired with the recent GPT-image-2 (which I’ll expand on later in this newsletter), GPT 5.5 becomes an excellent UI designer. This is a big area in which Claude still has moat and OpenAI is trying to catch up here, and the real alpha now is to use both the Image gen and 5.5 in tandem to create beautiful visuals and UIs. The main thing is, after testing it quite a few times, this only works if you generate an image outside of the session that builds the actual UI. we tried a couple of times to do it in 1 session, and the resulting UI doesn’t seem to be remotely close to the generated image. Only after sending this image to a completely fresh session and asking for a “pixel perfect” implementation, did GPT 5.5 start to resemble the input image and rebuild the whole ui in pixel perfect fidelity! GPT Image v2 - SOTA thinking image model, finally beating Nano Banana (Blog, Live)Like we said, OpenAI is dominating this week, and in both instances those are great models. Though, apples to apples comparison, GPT-image-v2 is a much higher jump — from previous models — than GPT 5.5! According to Artificial Analysis, the jump in how many people prefer GPT-image-2 in blind tests compared to other model is the higest we’ve ever seen, over 250 points. And you can clearly see it in the generations as well. Previously this week, we did a live streaming session with Peter Gostev (from Arena) and we did a deep dive comparing this new model to GPT Image 1.5, Nano Banana and Grok Imagine, and it’s a clear win
Hey ya’ll, Alex here with your weekly AI news catch up. It’s one of those Thursday’s where no matter how well I prep, the big AI labs are hell bent to show up before each other. Alibaba dropped Qwen 3.6 with Apache 2, confirming their commitment to Open Source, then Anthropic released Claude Opus 4.7 (not quite Mythos) and OpenAI followed with a huge Codex update that includes Computer Use among other things. The highlight of Computer User is the background usage, more on that below. This is all just from today!Previously in the week we had 2 incredible 3D world generators, Lyra 2.0 from Nvidia and HYWorld 2 from Tencent, Windsurf dropping 2.0 version with Devin integration and Google releasing a Gemini TTS, with over 90+ languages support and incredible emotions range, and Baidu open sources Ernie Image, rivaling Nano Banana. Today on the show we had 3 awesome guests, Theodor from Cognition joined to cover the new Windsurf, Kwindla is back on the show to talk about “the side project that escaped containment” Gradient-Bang, a multi agent, voice based space game and Trevor from Marimo joined to talk about pairing your agents with a Marimo notebook. Let’s dive in! 👇 ThursdAI - We’re over 16K on YT today, my goal is to get to parity with Substack, please subscribe. Codex can now really use your computer: OpenAI updates Codex with CUA, Image Generation, Browser, SSH (X, Blog)Codex from OpenAI has been the major focus inside OpenAI for a while now. We’ve reported previously that OpenAI is closing down SORA and other “side-quests” to focus, and that they will join Codex, ChatGPT and the Atlas browser into one “superapp” and today, it seems, that we’ve gotten an early glimpse of what that app will be. The Codex team (which seems to be growing from day to day), have been on a TEAR feature wise lately, trying to beat Claude Code, and they pushed an update with a LOT of features and updates, among them a new memory system, internal browser and image generation. The highlight for me though, was absolutely the polished computer use experience. Computer use is not new, Claude has a computer use feature flag, many others. Hell, we told you about computer use with Open Interpreter, back in Sep of 2023. But, this.... this feels different. You see, OpenAI has quietly purchased a company called Software Apps Inc, that almost launched a macos AI companion a year ago called Sky. This team is obsessed with Mac, and somehow, they were able to build a magical experience, a huge part of which, is the fact that they are controlling the mac, in the background. This is like black magic stuff. You work on one document, Codex clicks buttons and does things in another, without interrupting you. You may ask, Alex, why do you even care so much about computer use, when most of the work happens in the browser anyway, and Claude (and Codex) can control my browser anyway? Well, true, but not ALL work is happening there, for example, file system integration. It’s notoriously big part of browser automation that fails, when you need to upload/download files. I’ve spent countless cycles trying to get this to work with OpenClaw, and this, just does it. This closes the loop between knowledge work in the browser (yes, this thing can use your browser) and the broader OS. It’s so so polished, I truly recommend you try it. It’s as easy as @ tagging any app that you have running and asking Codex to do stuff there. Pro Tip: Enable fast mode for a much smoother experience. Anthropic Opus 4.7 is here, not quite Mythos, 64.3% Swe-bench Pro, tuned for long running tasks (X, System Card)What is there to say? Is this the model we expected from Anthropic after releasing the news about Claude Mythos last week? no. But hey, we’ll take it. I new Claude Opus, with a significantly improved multimodality capabilities, and a long horizon coding task improvements? For the same price? Well, not quite! Apparently, this model could be a “from scratch” trained model, given that the tokenizer (the thing that converts words into tokens for the LLM to understand) is a different one. It also uses 1.3x more tokens for the same tasks, which means, that the new and default model from Anthropic became effectively more expens
Free AI-powered daily recaps. Key takeaways, quotes, and mentions — in a 5-minute read.
Get Free Summaries →Free forever for up to 3 podcasts. No credit card required.
Listeners also like.

Last Week in AI
Summarizes significant AI news on a weekly basis.

Everyday AI Podcast – An AI and ChatGPT Podcast
Practical AI and ChatGPT tips for professionals to improve productivity and grow their careers.

AI For Humans: Weekly AI News, Tools & Trends
A weekly breakdown of major AI news, tools, and breakthroughs for both newcomers and seasoned enthusiasts.

The AI Daily Brief: Artificial Intelligence News and Analysis
A daily analysis of artificial intelligence news, exploring its creative potential, industry impacts, and ethical challenges.

The Anthropic AI Daily Brief
Breaks down Anthropic's latest AI advancements and their real-world implications in clear, accessible language.

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Explores machine learning and artificial intelligence through interviews with leading researchers, engineers, and industry experts.

Latent Space: The AI Engineer Podcast
Covers advances in AI engineering, including foundation models, code generation, and AI agents, through interviews with researchers and developers.

This Day in AI Podcast
Two friends discuss artificial intelligence, sharing casual insights, personal experiments, and humorous experiences with AI tools and technology.

The AI XR Podcast
Experts discuss AI, augmented reality, virtual reality, and spatial computing with industry leaders and innovators.

Limitless: An AI Podcast
Explores the frontiers of technology and artificial intelligence.

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Interviews with AI developers and researchers exploring the transformative impact of artificial intelligence on society and technology.

How I AI
A practical guide to using AI tools in work and life, featuring guests who share specific, actionable techniques and workflows.
Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.
AI-powered recaps with compact key takeaways, quotes, and insights.
Get key takeaways from ThursdAI - The top AI news from the past week in a 5-minute read.
Stay current on your favorite podcasts without falling behind.
It's a free AI-powered email that summarizes new episodes of ThursdAI - The top AI news from the past week as soon as they're published. You get the key takeaways, notable quotes, and links & mentions — all in a quick read.
When a new episode drops, our AI transcribes and analyzes it, then generates a personalized summary tailored to your interests and profession. It's delivered to your inbox every morning.
No. Podzilla is an independent service that summarizes publicly available podcast content. We're not affiliated with or endorsed by From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week.
Absolutely! The free plan covers up to 3 podcasts. Upgrade to Pro for 15, or Premium for 50. Browse our full catalog at /podcasts.
ThursdAI - The top AI news from the past week publishes weekly. Our AI generates a summary within hours of each new episode.
ThursdAI - The top AI news from the past week covers topics including News, Technology. Our AI identifies the specific themes in each episode and highlights what matters most to you.
Free forever for up to 3 podcasts. No credit card required.
Free forever for up to 3 podcasts. No credit card required.