17:20Structured Output from LLMs: Grammars, Regex, and State MachinesEfficient NLP9.4K viewsView & Download
9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.3K viewsView & Download
15:15How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor TeamLex Clips13.8K viewsView & Download