38:11Optimizing vLLM Performance through Quantization | Ray Summit 2024Anyscale3.0K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology81.9K viewsView & Download
14:58Scaling LLMs at Apple: Ray Serve + vLLM Deep Dive | Ray Summit 2025Anyscale853 viewsView & Download
27:39Databricks' vLLM Optimization for Cost-Effective LLM Inference | Ray Summit 2024Anyscale1.3K viewsView & Download
40:59Fast, Cheap, and Accurate: Optimizing LLM Inference with vLLM and Quantization by Legare KerrisonDevoxx UK121 viewsView & Download
32:18Embedded LLM’s Guide to vLLM Architecture & High-Performance Serving | Ray Summit 2025Anyscale2.1K viewsView & Download
45:48Optimizing LLM Inference with AWS Trainium, Ray, vLLM, and AnyscaleAnyscale1.2K viewsView & Download