RL From Scratch
-
Q-Learning Value-BasedTabular Q-Learning and Value Iteration implemented from scratch as educational notebooks.—
-
DQN Flappy Value-BasedDQN agent trained on Flappy Bird using pixel observations, experience replay, and epsilon-greedy exploration.—
-
VizDoom RL Value-BasedDQN agent trained on VizDoom Basic via Gymnasium wrapper, with grayscale preprocessing, replay buffer, and W&B logging.—
-
GRPO Policy-BasedGroup Relative Policy Optimization — DeepSeek-R1's RL training objective implemented from scratch.—
-
A2C (A2C) Actor-CriticImplementation of A2C reinforcement learning algorithm—
-
DDPG Actor-CriticImplementation of DDPG reinforcement learning algorithm—
-
DQN Frozenlake ExplorationImplementation of DQN-FrozenLake reinforcement learning algorithm—
-
DQN Lunar ExplorationImplementation of DQN-Lunar reinforcement learning algorithm—
-
DQN Taxi ExplorationImplementation of DQN-Taxi reinforcement learning algorithm—
-
DQN Atari ExplorationImplementation of DQN-atari reinforcement learning algorithm—
-
DQN ExplorationImplementation of DQN reinforcement learning algorithm—
-
Duel DQN ExplorationImplementation of Duel-DQN reinforcement learning algorithm—
-
Flappybird PPO Actor-CriticImplementation of FlappyBird-PPO reinforcement learning algorithm—
-
Frozen Lake ExplorationImplementation of Frozen-Lake reinforcement learning algorithm—
-
Imitation Learning Imitation LearningImplementation of Imitation Learning reinforcement learning algorithm—
-
MARL Multi-AgentImplementation of MARL reinforcement learning algorithm—
-
IPPO Multi-AgentImplementation of IPPO reinforcement learning algorithm—
-
MAPPO Multi-AgentImplementation of MAPPO reinforcement learning algorithm—
-
Self Play Multi-AgentImplementation of Self Play reinforcement learning algorithm—
-
PPO Actor-CriticImplementation of PPO reinforcement learning algorithm—
-
Atari Actor-CriticImplementation of Atari reinforcement learning algorithm—
-
MuJoCo Actor-CriticPPO on MuJoCo benchmark—
-
REINFORCE Actor-CriticImplementation of REINFORCE reinforcement learning algorithm—
-
RND Actor-CriticImplementation of RND reinforcement learning algorithm—
-
SAC Actor-CriticImplementation of SAC reinforcement learning algorithm—
-
TD3 Actor-CriticImplementation of TD3 reinforcement learning algorithm—