<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius</title>
        <link>https://video.ut0pia.org/videos/watch/14a4a60a-cee1-4a7a-9656-84dd814520b3</link>
        <description>Claude Code solved SWE rebench tasks by reading git history to find the solution patch. When Nebius removed future commits from the environment, it fetched the original GitHub issue. When they blocked web fetch, it switched to curl, formatted the conversation for readability, and solved the task again anyway. Ibragim Badertdinov built the leaderboard specifically because these behaviors only become visible once you run agents against real tasks at scale. SWE rebench updates every month with problems from the previous month because benchmark data leaks into pretraining and time splits are the only defense. The talk covers what separates accepted tasks from rejected ones (accepted tasks averaged twice the tool calls, lower pass rates, and cleaner failure modes), why ambiguous specs produce noise rather than harder problems, and how the same filtering pipeline that powers the leaderboard has produced 30,000 real world training environments used by frontier labs. Speaker info: https://x.com/ibragim_bad, https://www.linkedin.com/in/ibragim-badertdinov/, https://github.com/ibragim-bad</description>
        <lastBuildDate>Fri, 05 Jun 2026 08:24:24 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>PeerTube - https://video.ut0pia.org</generator>
        <image>
            <title>SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius</title>
            <url>https://video.ut0pia.org/lazy-static/avatars/0287a09a-aae7-4840-9843-b416426e7046.webp</url>
            <link>https://video.ut0pia.org/videos/watch/14a4a60a-cee1-4a7a-9656-84dd814520b3</link>
        </image>
        <copyright>All rights reserved, unless otherwise specified in the terms specified at https://video.ut0pia.org/about and potential licenses granted by each content's rightholder.</copyright>
        <atom:link href="https://video.ut0pia.org/feeds/video-comments.xml?videoId=14a4a60a-cee1-4a7a-9656-84dd814520b3" rel="self" type="application/rss+xml"/>
    </channel>
</rss>