Model-Based Reinforcement Learning
Model-based RL learns a model of the environment (dynamics model) and plans with this model instead of only learning from direct experience.
Model-Based RL learns a world model and plans mentally – more sample-efficient than model-free, the technique behind MuZero and Dreamer.
Explanation
The agent builds an internal world model: "If I take action A in state S, what happens?" This allows mental simulation and planning without needing the real environment.
Marketing Relevance
Model-based RL is more sample-efficient than model-free and relevant for world models in autonomous driving and robotics.
Common Pitfalls
Model errors accumulate over long horizons. Compounding errors. Difficult for high-dimensional environments.
Origin & History
Dyna (Sutton, 1991) was an early framework. MuZero (DeepMind, 2019) learned a model and mastered games without knowing rules. Dreamer (2020) for visual RL. World Models (Ha & Schmidhuber, 2018) were influential.
Comparisons & Differences
Model-Based Reinforcement Learning vs. Model-Free RL (PPO, DQN)
Model-free learns directly from experience (more samples needed); Model-based learns an environment model and simulates – fewer samples but model errors.