1:51MTP Speculative Decoding Explained: How AI Models Generate FasterTyrel Barstow4 viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.1K viewsView & Download
10:06DFlash Leaves Qwen Territory - Gemma 4 31B Now Runs 5x Faster with Speculative DecodingFahd Mirza5.1K viewsView & Download
40:19Speculation is all you need: Intro to Speculative Decoding for High Performance InferenceModal849 viewsView & Download
1:50Unleashing DFlash A Game Changer in Speculative Decoding! Full ReviewSimple Tech Lab41 viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
8:43DFlash Drafter for Gemma 4 26B - Official Speculative Decoding is Here: Run LocallyFahd Mirza5.5K viewsView & Download
8:27600 Toks/Second Gemma4-26B —The Setting That Actually Wins (vLLM + Dflash Speculative Decoding)Tech-Practice3.8K viewsView & Download
40:32ML Performance Reading Group 23: DFlash: Block Diffusion for Flash Speculative DecodingEleutherAI534 viewsView & Download
7:48Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]Jordan Boyd-Graber233 viewsView & Download