33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.0K viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.0K viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology81.7K viewsView & Download
7:35Gentle Introduction to Static, Dynamic, and Continuous Batching for LLM Inferenceneuralkian1.5K viewsView & Download
9:14What Is Llama.cpp? The LLM Inference Engine for Local AIIBM Technology146.6K viewsView & Download
20:18LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)Faradawn Yang4.2K viewsView & Download
19:35Optimizing LLM Hosting with the latest AWS Large Model Inference ContainerRam Vegiraju310 viewsView & Download
55:39Understanding LLM Inference | NVIDIA Experts Deconstruct How AI WorksDataCamp24.8K viewsView & Download
19:46Quantization vs Pruning vs Distillation: Optimizing NNs for InferenceEfficient NLP65.4K viewsView & Download