<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>The Art &amp; Science of Benchmarking Agents — Vincent Chen, Snorkel AI</title>
        <link>https://video.ut0pia.org/videos/watch/e3dd16a5-c25b-444b-9ca7-17dd32018ae5</link>
        <description>ARC AGI 3 launched a few weeks before this talk with every task human solvable and frontier models under 1%. That gap is the argument: our ability to measure AI has fallen behind our ability to build it, and benchmarks that actually shape the field are bets on where capabilities are going, not snapshots of where they are. Vincent Chen draws a framework from reviewing over 120 applications for Snorkel's $3 million Open Benchmarks Grants. The science is task quality, distributional diversity, model headroom, and robust eval methodology. The art is having a thesis (Terminal Bench bet on the CLI before coding agents made it obvious), producing research roadmaps, and treating researcher UX as a first class citizen. He closes on three axes he thinks the next generation of benchmarks needs to cover: environment complexity, autonomy horizon, and output complexity beyond plain text. Speaker info: https://x.com/vincentsunnchen, https://www.linkedin.com/in/vincentsunnchen, https://github.com/vincentschen</description>
        <lastBuildDate>Fri, 05 Jun 2026 09:02:14 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>PeerTube - https://video.ut0pia.org</generator>
        <image>
            <title>The Art &amp; Science of Benchmarking Agents — Vincent Chen, Snorkel AI</title>
            <url>https://video.ut0pia.org/lazy-static/avatars/0287a09a-aae7-4840-9843-b416426e7046.webp</url>
            <link>https://video.ut0pia.org/videos/watch/e3dd16a5-c25b-444b-9ca7-17dd32018ae5</link>
        </image>
        <copyright>All rights reserved, unless otherwise specified in the terms specified at https://video.ut0pia.org/about and potential licenses granted by each content's rightholder.</copyright>
        <atom:link href="https://video.ut0pia.org/feeds/video-comments.xml?videoId=e3dd16a5-c25b-444b-9ca7-17dd32018ae5" rel="self" type="application/rss+xml"/>
    </channel>
</rss>