RL From Scratch

  • Q-Learning Value-Based
    Tabular Q-Learning and Value Iteration implemented from scratch as educational notebooks.
  • DQN Flappy Value-Based
    DQN agent trained on Flappy Bird using pixel observations, experience replay, and epsilon-greedy exploration.
  • VizDoom RL Value-Based
    DQN agent trained on VizDoom Basic via Gymnasium wrapper, with grayscale preprocessing, replay buffer, and W&B logging.
  • GRPO Policy-Based
    Group Relative Policy Optimization — DeepSeek-R1's RL training objective implemented from scratch.
  • A2C (A2C) Actor-Critic
    Implementation of A2C reinforcement learning algorithm
  • DDPG Actor-Critic
    Implementation of DDPG reinforcement learning algorithm
  • DQN Frozenlake Exploration
    Implementation of DQN-FrozenLake reinforcement learning algorithm
  • DQN Lunar Exploration
    Implementation of DQN-Lunar reinforcement learning algorithm
  • DQN Taxi Exploration
    Implementation of DQN-Taxi reinforcement learning algorithm
  • DQN Atari Exploration
    Implementation of DQN-atari reinforcement learning algorithm
  • DQN Exploration
    Implementation of DQN reinforcement learning algorithm
  • Duel DQN Exploration
    Implementation of Duel-DQN reinforcement learning algorithm
  • Flappybird PPO Actor-Critic
    Implementation of FlappyBird-PPO reinforcement learning algorithm
  • Frozen Lake Exploration
    Implementation of Frozen-Lake reinforcement learning algorithm
  • Imitation Learning Imitation Learning
    Implementation of Imitation Learning reinforcement learning algorithm
  • MARL Multi-Agent
    Implementation of MARL reinforcement learning algorithm
  • IPPO Multi-Agent
    Implementation of IPPO reinforcement learning algorithm
  • MAPPO Multi-Agent
    Implementation of MAPPO reinforcement learning algorithm
  • Self Play Multi-Agent
    Implementation of Self Play reinforcement learning algorithm
  • PPO Actor-Critic
    Implementation of PPO reinforcement learning algorithm
  • Atari Actor-Critic
    Implementation of Atari reinforcement learning algorithm
  • MuJoCo Actor-Critic
    PPO on MuJoCo benchmark
  • REINFORCE Actor-Critic
    Implementation of REINFORCE reinforcement learning algorithm
  • RND Actor-Critic
    Implementation of RND reinforcement learning algorithm
  • SAC Actor-Critic
    Implementation of SAC reinforcement learning algorithm
  • TD3 Actor-Critic
    Implementation of TD3 reinforcement learning algorithm