9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.3K viewsView & Download
18:51Inference Providers: Best Way to Build with Open Source ModelsHugging Face18.0K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology82.3K viewsView & Download
6:56Inside LLM Inference: GPUs, KV Cache, and Token GenerationAI Explained in 5 Minutes1.1K viewsView & Download
8:12Optimizing GPU Parallelization for Model Inference on DatabricksVectorLab242 viewsView & Download
11:21Run Very Large Models With Consumer Hardware Using 🤗 Transformers and 🤗 Accelerate (PT. Conf 2022)PyTorch1.6K viewsView & Download
55:39Understanding LLM Inference | NVIDIA Experts Deconstruct How AI WorksDataCamp24.9K viewsView & Download
4:47How Can I Speed Up PyTorch Model Inference? - AI and Machine Learning ExplainedAI and Machine Learning Explained10 viewsView & Download
1:05:29ML Frameworks: Hugging Face Accelerate w/ Sylvain GuggerWeights & Biases5.0K viewsView & Download
33:39Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark MoyouAI Engineer45.7K viewsView & Download