9:39Faster LLMs: Accelerate Inference with Speculative DecodingIBM Technology26.3K viewsView & Download
4:58What is vLLM? Efficient AI Inference for Large Language ModelsIBM Technology82.1K viewsView & Download
6:28Large Language Models As Optimizers - OPRO by Google DeepMindAI Papers Academy3.9K viewsView & Download
13:10RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI ModelsIBM Technology657.0K viewsView & Download
11:23LLM Compression Explained: Build Faster, Efficient AI ModelsIBM Technology26.5K viewsView & Download
20:34How LLMs survive in low precision | Quantization FundamentalsJulia Turc56.4K viewsView & Download