19:46Quantization vs Pruning vs Distillation: Optimizing NNs for InferenceEfficient NLP65.4K viewsView & Download
18:58From FP32 to INT8: Post-Training Quantization Explained in PyTorchMLWorks1.0K viewsView & Download
20:34How LLMs survive in low precision | Quantization FundamentalsJulia Turc56.0K viewsView & Download
50:55Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware TrainingUmar Jamil54.6K viewsView & Download
4:30Get Started Post-Training Dynamic Quantization | AI Model Optimization with Intel® Neural CompressorIntel Devs10.7K viewsView & Download
54:01The practice of doing performance analysis/optimization with TensorRT-LLMNVIDIA Developer1.5K viewsView & Download
44:58Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLMNVIDIA Developer1.5K viewsView & Download
33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.0K viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
14:11Boost Deep Learning Inference Performance with TensorRT | Step-by-StepCode With Aarohi13.1K viewsView & Download
4:10How We Cut LLM GPU Costs from $60K to $6K — Inference Optimization GuideNeuralscale Engineering26 viewsView & Download
15:35Quantization in deep learning | Deep Learning Tutorial 49 (Tensorflow, Keras & Python)codebasics73.5K viewsView & Download
59:26How We Cut LLM Latency By 70% With NVIDIA TensorRT-LLM. MLOps Community - Maher Hanafi, SVP of EngMaher Hanafi145 viewsView & Download