<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>How Transformers Finally Ate Vision – Isaac Robinson, Roboflow</title>
        <link>https://video.ut0pia.org/videos/watch/f0f4136f-aa9b-41bf-aa5b-dea8c0d72f99</link>
        <description>Vision used to belong to CNNs. This talk explains why that changed, and why transformers only recently started winning for vision despite looking like the less natural fit for images. The answer runs through pretraining, scaling, borrowed infrastructure from the LLM world, and the long arc back to the simple architecture that scales best. Using the evolution from ViT and Swin through ConvNeXt, Hiera, SAM, and RF-DETR, Isaac Robinson walks through what actually made transformer vision systems practical, where the tradeoffs still are, and why deployment flexibility now matters as much as raw benchmark wins. What comes next for VLMs, world models, and physical AI? Speaker info: https://www.linkedin.com/in/robinsonish/</description>
        <lastBuildDate>Sat, 09 May 2026 17:31:38 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>PeerTube - https://video.ut0pia.org</generator>
        <image>
            <title>How Transformers Finally Ate Vision – Isaac Robinson, Roboflow</title>
            <url>https://video.ut0pia.org/lazy-static/avatars/0287a09a-aae7-4840-9843-b416426e7046.webp</url>
            <link>https://video.ut0pia.org/videos/watch/f0f4136f-aa9b-41bf-aa5b-dea8c0d72f99</link>
        </image>
        <copyright>All rights reserved, unless otherwise specified in the terms specified at https://video.ut0pia.org/about and potential licenses granted by each content's rightholder.</copyright>
        <atom:link href="https://video.ut0pia.org/feeds/video-comments.xml?videoId=f0f4136f-aa9b-41bf-aa5b-dea8c0d72f99" rel="self" type="application/rss+xml"/>
    </channel>
</rss>