33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.0K viewsView & Download
17:24Improving LLM Throughput via Data Center-Scale Inference OptimizationsNVIDIA Developer1.6K viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.0K viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
32:36Optimizing LLM Inference for the Rest of Us - Abdel Sghiouar, GoogleCNCF [Cloud Native Computing Foundation]199 viewsView & Download
17:52AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIAFaradawn Yang14.3K viewsView & Download
24:01Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, MicrosoftPyTorch244 viewsView & Download
1:05:21Ep03 Model to Production Optimizing, Deploying, and Scaling ML InferenceImproving21 viewsView & Download
14:20LLM Inference Optimization. Coherence in KV Cache Management. LLM Intra-Turn Cache Dynamics.Byte Goose AI.325 viewsView & Download
45:11LLM inference optimization: Model Quantization and DistillationYanAITalk1.3K viewsView & Download
20:18LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)Faradawn Yang4.2K viewsView & Download