9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.1K viewsView & Download
7:31🎯 Google AI Introduces STATIC: 948× Faster Constrained Decoding for LLM Generative RetrievalSubramanyam KMV105 viewsView & Download
11:53Greedy? Min-p? Beam Search? How LLMs Actually Pick Words – Decoding Strategies ExplainedAI Coffee Break with Letitia6.9K viewsView & Download
17:20Structured Output from LLMs: Grammars, Regex, and State MachinesEfficient NLP9.4K viewsView & Download
27:14Transformers, the tech behind LLMs | Deep Learning Chapter 53Blue1Brown10.3M viewsView & Download
15:15How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor TeamLex Clips13.8K viewsView & Download
7:40Speculative Decoding: 3× Faster LLM Inference with Zero Quality LossTales Of Tensors1.6K viewsView & Download
10:46GenAI: LLM Decoding Strategies Explained | Greedy, Beam, Top-k, Top-p, Temperature, ContrastiveBaba's World1.7K viewsView & Download
1:47:10Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 6 - LLM ReasoningStanford Online55.2K viewsView & Download