<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize</title>
        <link>https://video.ut0pia.org/videos/watch/2f3c9675-de95-48c2-be8d-7668b41baa10</link>
        <description>Most agents get tested by running a few queries and checking if it looks right. Laurie calls this the vibes problem: it doesn't catch regressions, doesn't run in CI, and doesn't tell you whether a prompt fix broke three other things. This workshop builds a complete eval pipeline from scratch on a financial analysis agent: tracing with Phoenix, reading traces before writing a single eval, categorizing failures by root cause, then building code evals, built-in LLM-as-a-judge evals, and a custom rubric with labeled examples. The sharpest lesson: choosing the right eval matters more than tuning it. A correctness eval scored 0 out of 13 on the same agent that a faithfulness eval scored 13 out of 13, because the model doesn't know what year it is and can't verify forward-looking financial data. The workshop closes on the thing most eval content skips — experiments that let you prove a prompt change actually worked, rather than eyeballing it and calling it a win. Speaker info: https://x.com/seldo, https://www.linkedin.com/in/seldo/, https://github.com/seldo</description>
        <lastBuildDate>Fri, 15 May 2026 18:50:40 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>PeerTube - https://video.ut0pia.org</generator>
        <image>
            <title>Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize</title>
            <url>https://video.ut0pia.org/lazy-static/avatars/0287a09a-aae7-4840-9843-b416426e7046.webp</url>
            <link>https://video.ut0pia.org/videos/watch/2f3c9675-de95-48c2-be8d-7668b41baa10</link>
        </image>
        <copyright>All rights reserved, unless otherwise specified in the terms specified at https://video.ut0pia.org/about and potential licenses granted by each content's rightholder.</copyright>
        <atom:link href="https://video.ut0pia.org/feeds/video-comments.xml?videoId=2f3c9675-de95-48c2-be8d-7668b41baa10" rel="self" type="application/rss+xml"/>
    </channel>
</rss>