48:46Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, mathUmar Jamil36.5K viewsView & Download
8:55Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explainedAI Coffee Break with Letitia40.8K viewsView & Download
21:15Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learningLuis Serrano Academy34.3K viewsView & Download
36:25Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model ExplainedGabriel Mongaras19.5K viewsView & Download
37:53Direct Preference Optimization (DPO) - math insight explainedRicardo Calix368 viewsView & Download
1:18:44Stanford CS234 I Guest Lecture on DPO: Rafael Rafailov, Archit Sharma, Eric Mitchell I Lecture 9Stanford Online12.6K viewsView & Download
19:47[2024 Best AI Paper] SimPO: Simple Preference Optimization with a Reference-Free RewardPaper With Video183 viewsView & Download