9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.0K viewsView & Download
33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.0K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology81.6K viewsView & Download
26:06LLM Optimization Lecture 5: Continuous Batching and Piggyback DecodingFaradawn Yang1.9K viewsView & Download
32:36Optimizing LLM Inference for the Rest of Us - Abdel Sghiouar, GoogleCNCF [Cloud Native Computing Foundation]196 viewsView & Download
19:35Optimizing LLM Hosting with the latest AWS Large Model Inference ContainerRam Vegiraju310 viewsView & Download
24:01Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, MicrosoftPyTorch243 viewsView & Download
37:52[VDBUH2026] Abdel Sghiouar - Optimizing LLM Inference for the Rest of UsDevoxx270 viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
4:10How We Cut LLM GPU Costs from $60K to $6K — Inference Optimization GuideNeuralscale Engineering26 viewsView & Download
40:59Fast, Cheap, and Accurate: Optimizing LLM Inference with vLLM and Quantization by Legare KerrisonDevoxx UK116 viewsView & Download
27:39Databricks' vLLM Optimization for Cost-Effective LLM Inference | Ray Summit 2024Anyscale1.3K viewsView & Download
17:52AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIAFaradawn Yang14.3K viewsView & Download