28:19Ultra-scale playbook, ch.3.2 - "Sequence Parallelism"Little ML book club219 viewsView & Download
53:19LION: Linear Attention for Efficient Bidirectional Sequence Modeling - Arshia Afzal | ASAP 46ASAP Seminar Series170 viewsView & Download
1:24:42Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 7: Parallelism 1Stanford Online43.6K viewsView & Download
20:50Test-Time Training with KV Binding Is Secretly Linear Attention (Feb 2026)AI Paper Slop38 viewsView & Download
18:52Parallax: Parameterized Local Linear Attention for Language Modeling (May 2026)AI Paper Slop26 viewsView & Download
48:06Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (Paper Explained)Yannic Kilcher29.0K viewsView & Download
20:18LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)Faradawn Yang4.4K viewsView & Download
7:27Two Dimensional Parallelism Using Distributed Tensors at PyTorch Conference 2022PyTorch3.4K viewsView & Download
1:06:39Lazy and Fast: Ranges Meet Parallelism in C++ - Daniel Anderson - CppCon 2025CppCon8.1K viewsView & Download
6:09Gated Attention: Non-linearity, Sparsity, and LLM StabilityResearch Paper Review344 viewsView & Download
30:05Scale ANY Model: PyTorch DDP, ZeRO, Pipeline & Tensor Parallelism Made Simple (2025 Guide)Zachary Mueller1.5K viewsView & Download
40:54Deep dive - Better Attention layers for Transformer modelsJulien Simon15.8K viewsView & Download
30:46Linear MoE: Supercharging AI with Efficient Sequence ModelingTalkTensors: AI Podcast Covering ML Papers5 viewsView & Download