Reinforcement Learning.
From Bellman equations to GRPO.
Every agentic capability in today's frontier models comes from reinforcement learning. This cohort teaches you RL from the ground up.
The long-horizon thinking, the step-by-step decomposition of a hard problem into actions, the ability to try, fail, and self-correct over many steps: all of it is reinforcement learning. RL is also the engine behind modern robotics, where a policy learns to act in the physical world. This live weekend cohort builds every algorithm from scratch, from MDPs and Q-learning to PPO, and finishes on the RLHF, DPO, and GRPO stack behind today's reasoning models.
Exploring the terrain, trial and error.


















Glad to be part of this cohort. What stands out is the practical depth, not just tools, but how AI, system design, and agentic patterns come together for real-world engineering. Looking forward to learning more.
What I appreciate most is the depth of learning. Instead of just covering the “what,” the cohort dives into the “how” and “why” behind AI concepts. Great experience so far!
Aseem's masterclass finally made AI click for me beyond just writing prompts. He goes deep into how the models and agents actually work under the hood, the attention math, the agent loops, the evals, exactly the kind of depth you need as an engineer who wants to build with AI, not just use it. Genuinely one of the most useful technical programs I've done in years.
Joining this cohort was one of the best decisions I made for my AI learning journey. Before this, I was unsure where to start and overwhelmed by the noise around AI. Aseem's sessions gave me clarity, strong fundamentals, and the confidence to build my own agents. The focus on basic principles and real-world systems makes all the difference.
I learned a lot about the internal workings of AI, which is helping me use AI far more effectively for technical and complex problem-solving tasks.
The pace is intense and the depth is real. We built things from scratch instead of gluing libraries together, and that completely changes how you think about the stack. Easily the best technical cohort I've taken.
I would highly recommend this cohort to anyone who wants to understand Agentic AI beyond the hype and surface-level tutorials. What makes this program stand out is the way it combines fundamentals, system design, and real-world implementation thinking. The cohort does not just focus on tools or quick demos. It helps you understand how AI systems are actually designed, how LLMs and agents fit into modern product architectures, and how to reason about them as an engineer. If you want depth instead of buzzwords, this is the cohort to join.
Genuinely one of the best learning experiences I've had as an engineer. Aseem takes dense AI and systems topics and turns them into something you can actually build with, every session moves you from theory to working code. It's the rare cohort that respects your time and assumes you want real depth. Highly recommend it to anyone serious about going beyond the surface.
As a staff engineer, what I value most is depth and first-principles thinking, and this cohort delivers both. Aseem connects the math, the systems, and the production reality in a way I haven't seen in any other program. It's rare to find teaching that is this rigorous and this practical at the same time. I came in to fill gaps and left with a genuinely stronger mental model of the whole stack.
I've done plenty of online courses that stay at the surface. This one goes all the way down, tokenization, attention, agent loops, evals, and then back up to production. I finally feel like I understand AI instead of just using it.
As someone coming from a backend and system-design background, this cohort has helped me connect traditional engineering principles with modern AI systems. Every session leaves me with a long list of things to explore and apply. Great learning experience so far.
I'm attending this weekend cohort on AI agents, and now I finally understand how AI and agents actually work. Earlier, AI was just magic to me, now I understand the machinery behind it. Thanks Aseem for these sessions.
Coming from a distributed-systems background, I expected the AI parts to feel hand-wavy. They didn't. Every concept is grounded in how you'd actually design, ship, and operate it, latency, failure modes, evals, the works. This is the most engineering-honest AI course I've come across.
Glad to be part of this cohort. What stands out is the practical depth, not just tools, but how AI, system design, and agentic patterns come together for real-world engineering. Looking forward to learning more.
What I appreciate most is the depth of learning. Instead of just covering the “what,” the cohort dives into the “how” and “why” behind AI concepts. Great experience so far!
Aseem's masterclass finally made AI click for me beyond just writing prompts. He goes deep into how the models and agents actually work under the hood, the attention math, the agent loops, the evals, exactly the kind of depth you need as an engineer who wants to build with AI, not just use it. Genuinely one of the most useful technical programs I've done in years.
Joining this cohort was one of the best decisions I made for my AI learning journey. Before this, I was unsure where to start and overwhelmed by the noise around AI. Aseem's sessions gave me clarity, strong fundamentals, and the confidence to build my own agents. The focus on basic principles and real-world systems makes all the difference.
I learned a lot about the internal workings of AI, which is helping me use AI far more effectively for technical and complex problem-solving tasks.
The pace is intense and the depth is real. We built things from scratch instead of gluing libraries together, and that completely changes how you think about the stack. Easily the best technical cohort I've taken.
I would highly recommend this cohort to anyone who wants to understand Agentic AI beyond the hype and surface-level tutorials. What makes this program stand out is the way it combines fundamentals, system design, and real-world implementation thinking. The cohort does not just focus on tools or quick demos. It helps you understand how AI systems are actually designed, how LLMs and agents fit into modern product architectures, and how to reason about them as an engineer. If you want depth instead of buzzwords, this is the cohort to join.
Genuinely one of the best learning experiences I've had as an engineer. Aseem takes dense AI and systems topics and turns them into something you can actually build with, every session moves you from theory to working code. It's the rare cohort that respects your time and assumes you want real depth. Highly recommend it to anyone serious about going beyond the surface.
As a staff engineer, what I value most is depth and first-principles thinking, and this cohort delivers both. Aseem connects the math, the systems, and the production reality in a way I haven't seen in any other program. It's rare to find teaching that is this rigorous and this practical at the same time. I came in to fill gaps and left with a genuinely stronger mental model of the whole stack.
I've done plenty of online courses that stay at the surface. This one goes all the way down, tokenization, attention, agent loops, evals, and then back up to production. I finally feel like I understand AI instead of just using it.
As someone coming from a backend and system-design background, this cohort has helped me connect traditional engineering principles with modern AI systems. Every session leaves me with a long list of things to explore and apply. Great learning experience so far.
I'm attending this weekend cohort on AI agents, and now I finally understand how AI and agents actually work. Earlier, AI was just magic to me, now I understand the machinery behind it. Thanks Aseem for these sessions.
Coming from a distributed-systems background, I expected the AI parts to feel hand-wavy. They didn't. Every concept is grounded in how you'd actually design, ship, and operate it, latency, failure modes, evals, the works. This is the most engineering-honest AI course I've come across.
Eighteen modules. Every one builds an algorithm.
Grouped into five levels of increasing depth, from bandits to reasoning LLMs. Each module ends in code you write yourself.
Foundations & Bandits
The RL Problem & Multi-Armed Bandits
The trial-and-error paradigm with roots in operations research, psychology, and AI. We start where Sutton & Barto and Ravindran's NPTEL course both start, “immediate RL”, to build intuition for exploration vs. exploitation before time and state enter the picture.
The Sutton & Barto 10-armed testbed, comparing ε-greedy, UCB1, and Thompson sampling with regret curves.
Contextual Bandits & Policy Search
Bridge from stateless bandits to full RL by adding context, and introduce the policy-gradient idea early. This frames the whole course: value-based vs. policy-based learning.
A contextual-bandit news / ad recommender using LinUCB and a policy-gradient bandit, evaluated by cumulative reward.
Markov Decision Processes & Bellman Equations
Introduce time. Formalise returns, discounting, and value functions, then prove why the machinery works, the part most courses skip and most bugs come from.
A GridWorld MDP library with a Bellman-backup solver and visualised value / policy heatmaps.
Tabular Methods
Dynamic Programming
With a known model, solve MDPs exactly. This is the conceptual backbone every later algorithm approximates.
Policy iteration vs. value iteration on FrozenLake and Jack's Car Rental, comparing iteration counts and runtime.
Monte Carlo Methods
Learn from experience without a model. Introduce the on-policy / off-policy distinction and importance sampling, concepts that resurface in PPO and RLHF.
A Blackjack MC-control agent (off-policy via importance sampling), plus a minimal UCT player for Tic-Tac-Toe.
Temporal-Difference Learning & Eligibility Traces
The central idea of RL: bootstrapping. Build SARSA and Q-learning, then unify Monte Carlo and TD with eligibility traces.
SARSA vs. Q-learning vs. Expected SARSA on Cliff Walking and Windy GridWorld, with a TD(λ) extension.
Deep Value-Based RL
Function Approximation & the Deadly Triad
Tabular methods don't scale. Move to parameterised value functions and confront the instability that defines deep RL.
Linear semi-gradient SARSA with tile coding on Mountain Car, then Fitted Q-Iteration, contrasting stability.
Deep Q-Networks (DQN) & Variants
The 2013/2015 breakthrough that launched deep RL. Build DQN end-to-end, then layer on the “Rainbow” improvements.
A from-scratch PyTorch DQN beating CartPole, then a CNN DQN on Atari (Breakout / Pong) with Double + Dueling + PER.
Policy Gradients & Control
Policy Gradient Methods & REINFORCE
Optimise the policy directly, essential for continuous actions and the foundation of all LLM RL.
REINFORCE with a learned baseline on CartPole and LunarLander, plotting variance with and without the baseline.
Actor-Critic Methods (A2C / A3C / GAE)
Combine value learning and policy gradients to cut variance. The architecture that PPO and GRPO inherit.
A synchronous A2C with GAE and vectorised environments on LunarLander, exposing λ and n-step as tunable knobs.
Trust Regions & PPO
Why naive policy gradients are unstable, and the trust-region fix that became the industry default, and the literal algorithm behind RLHF.
A clean, CleanRL-style PPO solving continuous control (Pendulum / Hopper), validated against Stable-Baselines3.
Continuous Control: DDPG, TD3 & SAC
Off-policy actor-critics for robotics and control, where overestimation and exploration must be handled explicitly.
TD3 and SAC on MuJoCo tasks (HalfCheetah, Ant), benchmarking sample efficiency and stability against each other.
Frontiers & LLM Alignment
Exploration & Intrinsic Motivation
Sparse-reward problems where ε-greedy fails, the bridge to reasoning tasks where reward is rare and binary.
Add RND intrinsic rewards to PPO and solve a sparse-reward MiniGrid task that vanilla PPO cannot.
Model-Based RL & Planning
Learn a world model to plan and slash sample cost, the lineage of AlphaGo through Dreamer.
Dyna-Q on a maze (planning-steps ablation), plus a small learned-dynamics model for short-horizon planning.
Offline RL & Hierarchical RL
Learning from fixed datasets with no environment access, and temporal abstraction, directly relevant to training on logged and preference data.
Train CQL or IQL on a D4RL dataset and compare against behaviour cloning, quantifying the offline distributional-shift gap.
RLHF: Reward Modeling + PPO
The classic three-stage alignment pipeline that produced InstructGPT and ChatGPT, now framed as “RL where the environment is a language model.”
A full RLHF loop with TRL on a small model: train a reward model, then PPO-fine-tune with a KL penalty; measure win-rate vs. the SFT baseline.
DPO & Preference Optimization Without RL
The 2023 insight that you can skip the reward model and the RL loop entirely, now a production default.
DPO-fine-tune a model with TRL on the same preference data from Module 16; compare alignment, cost, and stability head-to-head with the PPO result.
GRPO & RLVR: Training Reasoning Models
The 2024–25 frontier: drop the critic, drop the reward model, reward only verifiable correctness, and watch chain-of-thought reasoning emerge. The course's culmination, mirroring the DeepSeek-R1 recipe.
Use GRPO (TRL or veRL) with a verifiable math / code reward to fine-tune a small base model on GSM8K-style problems; track accuracy and emergent chain-of-thought length over training.
Aseem Rastogi.
Software & AI Architect · Co-Founder & CTO, Agentcord.ai
Ex-Architect, iXiGo · Ex-Staff Engineer, Synaptic · Ex-Senior Computer Scientist, Belzabar · B.Tech CSE, NIT Hamirpur (Gold Medalist)
Not a course taught from tutorials. Taught by the architect who built the systems, at iXiGo, Synaptic, and now Agentcord.ai.
Live on weekends. Supported every day.
Live weekend cohort
Saturday and Sunday sessions, with recordings of every class.
Hands-on
Weekly labs and a portfolio-grade capstone at the end of every level.
Support
Weekly office hours and capstone reviews per level.
Private Discord community
Dedicated channels per module and topic. Ask anything, any time. Every question and answer lives permanently, becoming a growing knowledge base for the cohort.
The full modern reinforcement learning toolkit.
From Gymnasium to veRL. Every one of these appears in at least one module or lab.
Reinforcement learning is how machines learn to act, and now how they learn to reason. The engineers who understand it from the math up will build the systems that decide.
Doors open for the next cohort soon.
Questions worth asking.
A little helps, but the course is self-contained. You need comfort with Python, NumPy, and basic calculus and probability (gradients, expectations). We build every algorithm from the MDP up and don't assume prior RL or deep-learning experience.
Join our rapidly growing WhatsApp community.
Tap in to a fast-growing community of engineers going deep on AI: cohort updates, resources, and a place to ask anything, alongside people building the same things you are.
Free to join. Open to anyone serious about going deep on AI.