9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.4K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology82.5K viewsView & Download
10:232025 US LLVM Developers' Meeting: MLIR based graph compiler for in-memory inference computingLLVM298 viewsView & Download
26:10How vLLM Became the Standard for Fast AI Inference | Simon Mo, InferactLightspeed Venture Partners1.0M viewsView & Download
57:48Next-Gen Long-Context LLM Inference with LMCache - Junchen Jiang (UChicago & LMCache)Nadav Timor1.8K viewsView & Download
25:30Federated llm-d: Elevating Distributed Inference Beyond Clus... Madhuri Yechuri & Abhishek MalvankarCNCF [Cloud Native Computing Foundation]235 viewsView & Download
41:00Facebook: Scalable Memory Allocation using jemalloc (Tech Talk, 1⧸11⧸2011)Taylor Hodge552 viewsView & Download