20:18LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)Faradawn Yang4.4K viewsView & Download
33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer46.0K viewsView & Download
4:33LLM Parallelism Explained: Data, Tensor, Pipeline & MoreYi's Learning Notes106 viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology82.6K viewsView & Download
30:05Scale ANY Model: PyTorch DDP, ZeRO, Pipeline & Tensor Parallelism Made Simple (2025 Guide)Zachary Mueller1.5K viewsView & Download
19:46Quantization vs Pruning vs Distillation: Optimizing NNs for InferenceEfficient NLP65.7K viewsView & Download
6:59Model Parallelism vs Data Parallelism vs Tensor Parallelism | #deeplearning #llmsLazy Analyst3.7K viewsView & Download
10:57Parallel Track Transformers Explained (vLLM) – Reducing GPU Sync in LLM InferenceMachine Learning with PyTorch92 viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.4K viewsView & Download
10:06Why Your AI is Slow: Master LLM Inference OptimizationTutorialsArena - MCQs, Coding Interviews & More!3 viewsView & Download