7:09LightThinker++: Adaptive Memory Management for Efficient LLM ReasoningResearch Paper Review144 viewsView & Download
20:34How LLMs survive in low precision | Quantization FundamentalsJulia Turc56.7K viewsView & Download
11:23LLM Compression Explained: Build Faster, Efficient AI ModelsIBM Technology26.6K viewsView & Download
19:46Quantization vs Pruning vs Distillation: Optimizing NNs for InferenceEfficient NLP65.7K viewsView & Download
6:28Memory-Efficient LLMs: Attention I/O, KV Cache Eviction, and MoE CompressionNeural Trend Hub32 viewsView & Download
44:06LLM inference optimization: Architecture, KV cache and Flash attentionYanAITalk15.5K viewsView & Download
7:14TriAttention: Trigonometric KV Compression for Efficient LLM ReasoningResearch Paper Review188 viewsView & Download
1:04:09Terence Tao: Nobody Understands Why AI Actually WorksDr Brian Keating247.0K viewsView & Download