11:13Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEvalConference on Language Modeling99 viewsView & Download
9:05How to Setup DeepEval for Fast, Easy, and Powerful LLM EvaluationsLeon Builds Agents19.7K viewsView & Download
7:58Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability ComparisonAI Quality Nerd684 viewsView & Download
2:06Ollama vs VLLM vs Llama.cpp: Best Local AI Runner in 2026?Savage Reviews36.4K viewsView & Download
1:07:28Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMindLatent Space932 viewsView & Download