9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.4K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology82.4K viewsView & Download
33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.8K viewsView & Download
19:46Quantization vs Pruning vs Distillation: Optimizing NNs for InferenceEfficient NLP65.7K viewsView & Download
9:14What Is Llama.cpp? The LLM Inference Engine for Local AIIBM Technology148.2K viewsView & Download
24:01Tour De Force: LLM Inference Optimization From Simple To Sophisticated - Christin Pohl, MicrosoftPyTorch261 viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
9:06What is Prompt Caching? Optimize LLM Latency with AI TransformersIBM Technology88.6K viewsView & Download
17:24Improving LLM Throughput via Data Center-Scale Inference OptimizationsNVIDIA Developer1.6K viewsView & Download
20:18LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)Faradawn Yang4.4K viewsView & Download
11:23LLM Compression Explained: Build Faster, Efficient AI ModelsIBM Technology26.6K viewsView & Download
17:52AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIAFaradawn Yang14.5K viewsView & Download