50:55Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware TrainingUmar Jamil54.8K viewsView & Download
20:34How LLMs survive in low precision | Quantization FundamentalsJulia Turc56.7K viewsView & Download
2:12:21LLM Fine-Tuning 12: LLM Quantization Explained( PART 1) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cppSunny Savita8.1K viewsView & Download
7:09LightThinker++: Adaptive Memory Management for Efficient LLM ReasoningResearch Paper Review144 viewsView & Download
13:39Rethinking KV Cache Compression Techniques for LLM ServingDSAI by Dr. Osbert Tay223 viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
11:23LLM Compression Explained: Build Faster, Efficient AI ModelsIBM Technology26.6K viewsView & Download
19:46Quantization vs Pruning vs Distillation: Optimizing NNs for InferenceEfficient NLP65.7K viewsView & Download