9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.1K viewsView & Download
11:53Greedy? Min-p? Beam Search? How LLMs Actually Pick Words – Decoding Strategies ExplainedAI Coffee Break with Letitia6.9K viewsView & Download
17:52AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIAFaradawn Yang14.3K viewsView & Download
11:23LLM Compression Explained: Build Faster, Efficient AI ModelsIBM Technology26.3K viewsView & Download
17:20Structured Output from LLMs: Grammars, Regex, and State MachinesEfficient NLP9.4K viewsView & Download
24:10Decoding Strategies in LLMs (Explained Simply) | How LLMs Choose the Next TokenMrinal Rawat75 viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology81.9K viewsView & Download