1:04:12AI Evals w: Valentin Hofmann — Fluid Language Model BenchmarkingalphaXiv166 viewsView & Download
13:15CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid DynamicsShaowu Pan94 viewsView & Download
5:507 Popular LLM Benchmarks Explained [OpenLLM Leaderboard & Chatbot Arena]bycloud28.3K viewsView & Download
30:56What Do LLM Benchmarks Actually Tell Us? (+ How to Run Your Own)Adam Lucek9.3K viewsView & Download
1:00:30Benchmarking Optimization Solvers for Energy System Models: 2025 ResultsOpen Energy Transition0 viewsView & Download
9:19LLM Benchmarking | How one LLM is tested against another? | LLM Evaluation Benchmarks | SimplilearnSimplilearn2.7K viewsView & Download
55:12The Science of Benchmarking Panel (NeurIPS 2025 Tutorial)Michael Saxon (NLP & Generative AI research)1.4K viewsView & Download
9:37Reproducing Leaderboard Benchmarks: Evaluate Your LLM Like Hugging FaceReady Tensor157 viewsView & Download