9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.1K viewsView & Download
11:53Greedy? Min-p? Beam Search? How LLMs Actually Pick Words – Decoding Strategies ExplainedAI Coffee Break with Letitia6.9K viewsView & Download
17:52AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIAFaradawn Yang14.3K viewsView & Download
26:06LLM Optimization Lecture 5: Continuous Batching and Piggyback DecodingFaradawn Yang1.9K viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.3K viewsView & Download
32:03DistServe: disaggregating prefill and decoding for goodput-optimized LLM inferencePyTorch5.5K viewsView & Download