9:21PPO Explained: The Default Policy Gradient Algorithm Behind RLHF and AI AgentsLamhot Siagian15 viewsView & Download
19:50An introduction to Policy Gradient methods - Deep Reinforcement LearningArxiv Insights264.3K viewsView & Download
22:03Proximal Policy Optimization (PPO) for LLMs Explained IntuitivelyJulia Turc56.9K viewsView & Download
31:15Simply Explaining Proximal Policy Optimization (PPO) | Deep Reinforcement LearningJohnny Code25.8K viewsView & Download
29:05Policy Gradient Methods | Reinforcement Learning Part 6Mutual Information75.1K viewsView & Download
11:29Reinforcement Learning from Human Feedback (RLHF) ExplainedIBM Technology89.5K viewsView & Download
2:15:13Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.Umar Jamil71.0K viewsView & Download
38:24Proximal Policy Optimization (PPO) - How to train Large Language ModelsLuis Serrano Academy85.1K viewsView & Download
22:44LLM Training & Reinforcement Learning from Google Engineer | SFT + RLHF | PPO vs GRPO vs DPOMartin Is A Dad14.4K viewsView & Download
18:37ChatGPT explained: A Guide to Conversational AI w/ InstructGPT, PPO, Markov, RLHFDiscover AI8.1K viewsView & Download
1:33:58RL Course by David Silver - Lecture 7: Policy Gradient MethodsGoogle DeepMind311.8K viewsView & Download
1:02:47Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO TutorialMachine Learning with Phil87.3K viewsView & Download
8:15Simply Explaining REINFORCE (Vanilla Policy Gradient VPG) | Deep Reinforcement LearningJohnny Code5.3K viewsView & Download
41:01Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPOAI Prism60.4K viewsView & Download
59:36Policy Gradient Theorem Explained - Reinforcement LearningElliot Waite84.1K viewsView & Download