9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.0K viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
12:30Speeding Up LLMs: Speculative Decoding for Multi-Sample InferenceTalkTensors: AI Podcast Covering ML Papers18 viewsView & Download
40:19Speculation is all you need: Intro to Speculative Decoding for High Performance InferenceModal842 viewsView & Download
12:49Audio Overview: Accelerating LLM Inference with Lossless Speculative Decoding (read)Xiao Yang57 viewsView & Download
7:52Accelerating LLM Inference on TPUs via Diffusion Speculative DecodingKnut Jägersberg11 viewsView & Download
23:40Speculative Speculative Decoding: How to Parallelize Drafting and ... for 2x Faster LLM InferenceXiaol.x185 viewsView & Download
1:00:54Accelerating LLM Inference with vLLM (and SGLang) - Ion StoicaNadav Timor7.9K viewsView & Download
12:25Speculative Decoding: Faster Inference for Transformers and LLMsThe Clue Matrix14 viewsView & Download
1:36:03ML Performance Reading Group Session 19: Speculative DecodingEleutherAI1.0K viewsView & Download
12:45Speculative Decoding & Inference Speed — 2-3x Faster LLMs With Zero Quality LossJeff Heidelberger2 viewsView & Download