Update RLHF_with__PPO.md

shaheennabi · web-flow · commit eb1a271f78e4 · 2024-11-21T10:32:06.000-08:00
diff --git a/docs/RLHF_with__PPO.md b/docs/RLHF_with__PPO.md
@@ -1,8 +1,9 @@
 # Reinforcement Learning from Human Feedback with PPO
 
-![Uploading image.png…]()
 
 
+![Uploading Screenshot 2024-11-21 083539.png…]()
+
 
 What is it, and why is it so confusing? Well, in this file, I will take you on a new adventure, and we will learn what **RLHF** with **PPO** actually means.