1:36:03ML Performance Reading Group Session 19: Speculative DecodingEleutherAI1.0K viewsView & Download
40:32ML Performance Reading Group 23: DFlash: Block Diffusion for Flash Speculative DecodingEleutherAI538 viewsView & Download
1:51MTP Speculative Decoding Explained: How AI Models Generate FasterTyrel Barstow4 viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.1K viewsView & Download
28:12The Dark Arts of ML Benchmarking - Yonatan AlexanderpyGrunn and aiGrunn Conferences2 viewsView & Download
14:37Understanding Speculative Decoding: Boosting LLM Efficiency and SpeedMLWorks490 viewsView & Download
40:19Speculation is all you need: Intro to Speculative Decoding for High Performance InferenceModal857 viewsView & Download
7:48Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]Jordan Boyd-Graber233 viewsView & Download
22:36MASSIVELY speed up local AI models with Speculative Decoding in LM StudioGosuCoder21.2K viewsView & Download
10:14MLX India Community Meetup 1 | Boosting local model performance - Speculative decoding with DFlashConscious Engines95 viewsView & Download
1:04:28vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024Neural Magic3.4K viewsView & Download