HOT-GP: principled exploration in model-based RL

🚀 We are excited to announce that Jasmine Bayrooti will be presenting collaborative work with Carl Henrik Ek and Amanda Prorok on “Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling” at ICLR 2025! 🎉

This paper introduces HOT-GP (Hallucination-based Optimistic Thompson sampling with Gaussian Processes) — a principled model-based reinforcement learning algorithm that explores efficiently by reasoning about joint uncertainty over both environment dynamics and rewards. 🤖📊 In contrast to prior methods that treat these uncertainties separately or assume access to a known reward function, HOT-GP learns their interplay with a multi-output Gaussian Process model. 🔁✨ By sampling transitions conditioned on optimistic rewards, the algorithm imagines plausible, high-value futures that drive useful exploration. 🌟🔍 This approach achieves strong sample efficiency, particularly in environments with sparse feedback or action penalties, and consistently outperforms existing strategies across MuJoCo benchmarks and VMAS robotics simulations. 🏆🤯

📝 Read the paper: https://lnkd.in/exSqSGuY 📍 Come see our poster (#398) on Friday, April 25, between 3:00–5:30 p.m. (UTC+8) in Hall 3 & Hall 2B at ICLR.

We’d love to chat about principled exploration and scaling efficient reinforcement learning to complex environments! 🤝🧠

HOT-GP: principled exploration in model-based RL — ICLR 2025