9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.0K viewsView & Download
15:15How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor TeamLex Clips13.8K viewsView & Download
40:19Speculation is all you need: Intro to Speculative Decoding for High Performance InferenceModal843 viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
1:36:03ML Performance Reading Group Session 19: Speculative DecodingEleutherAI1.0K viewsView & Download
8:44How to PROPERLY Use Speculative Decoding in LM Studio to DOUBLE Your AI SpeedAsapGuide3.8K viewsView & Download
7:48Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]Jordan Boyd-Graber233 viewsView & Download