7:52Accelerating LLM Inference on TPUs via Diffusion Speculative DecodingKnut Jägersberg11 viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.5K viewsView & Download
40:19Speculation is all you need: Intro to Speculative Decoding for High Performance InferenceModal899 viewsView & Download
5:17CVPR 26 - Multi-Scale Local Speculative Decoding for Image GenerationElia Peruzzo1 viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
4:53What is Speculative Decoding? making LLMs fasterData Science in your pocket65 viewsView & Download
12:18How do LLMs run efficiently at scale? KV-cache, speculative decoding explainedSreeJagatab0 viewsView & Download
9:15Accelerating Gemma 4 via Speculative Decoding and MTP DraftersKnut Jägersberg155 viewsView & Download
2:10KDD 2026 - Reinforcement Speculative Decoding for Fast RankingAssociation for Computing Machinery (ACM)50 viewsView & Download
8:41How Speculative Decoding Breaks the Autoregressive Bottleneck in LLMsDrSanchezLeandro22 viewsView & Download
10:06DFlash Leaves Qwen Territory - Gemma 4 31B Now Runs 5x Faster with Speculative DecodingFahd Mirza5.5K viewsView & Download
40:32ML Performance Reading Group 23: DFlash: Block Diffusion for Flash Speculative DecodingEleutherAI618 viewsView & Download