2:15:40Codex: Evaluating Large Language Models Trained on CodeSamuel Albanie3.8K viewsView & Download
31:26#2 Evaluating Large Language Models Trained on Code by OpenAIRamit Surana168 viewsView & Download
18:53MAPS'22 - A Systematic Evaluation of Large Language Models of CodeACM SIGPLAN270 viewsView & Download
55:02How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)Dave Ebbelaar55.8K viewsView & Download
30:56What Do LLM Benchmarks Actually Tell Us? (+ How to Run Your Own)Adam Lucek9.2K viewsView & Download
1:49:25Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM EvaluationStanford Online62.6K viewsView & Download