9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.0K viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.5K viewsView & Download
12:30Speeding Up LLMs: Speculative Decoding for Multi-Sample InferenceTalkTensors: AI Podcast Covering ML Papers18 viewsView & Download
12:25Speculative Decoding: Faster Inference for Transformers and LLMsThe Clue Matrix14 viewsView & Download
40:19Speculation is all you need: Intro to Speculative Decoding for High Performance InferenceModal842 viewsView & Download
22:36MASSIVELY speed up local AI models with Speculative Decoding in LM StudioGosuCoder21.2K viewsView & Download
23:40Speculative Speculative Decoding: How to Parallelize Drafting and ... for 2x Faster LLM InferenceXiaol.x184 viewsView & Download
15:15How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor TeamLex Clips13.8K viewsView & Download
1:50Speculative Speculative Decoding: Parallelizing Sequential Bottlenecks in LLM InferenceEmergent Mind25 viewsView & Download