9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.3K viewsView & Download
22:14[Recurrent Transformer] Beyond the Transformer: Frontier AI Architectures of 2026 Efficient DecodingByte Goose AI.256 viewsView & Download
26:10Attention in transformers, step-by-step | Deep Learning Chapter 63Blue1Brown4.1M viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
10:46GenAI: LLM Decoding Strategies Explained | Greedy, Beam, Top-k, Top-p, Temperature, ContrastiveBaba's World1.7K viewsView & Download
15:15How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor TeamLex Clips13.8K viewsView & Download
27:14Transformers, the tech behind LLMs | Deep Learning Chapter 53Blue1Brown10.3M viewsView & Download
8:38Transformers: The best idea in AI | Andrej Karpathy and Lex FridmanLex Clips434.4K viewsView & Download
22:10How Attention Mechanism Works in Transformer ArchitectureUnder The Hood112.7K viewsView & Download
57:34MIT 6.S191: Recurrent Neural Networks, Transformers, and AttentionAlexander Amini42.8K viewsView & Download
24:34Scaling Transformer to 1M tokens and beyond with RMT (Paper Explained)Yannic Kilcher59.6K viewsView & Download