Improved regret bounds for Thompson sampling — NeurIPS 2025

We are excited to announce that Jasmine Bayrooti will be presenting recent work deriving improved regret bounds for Thompson sampling at NeurIPS 2025! This paper establishes no-regret guarantees for Thompson sampling in episodic Markov Decision Processes modeled via joint multi-output Gaussian Processes. The analysis addresses the non-Gaussian nature of value functions and recursive Bellman updates by establishing novel confidence bounds for compositional functions of GPs. We further extend classical regret tools to multi-output settings, introducing a multi-output elliptical potential lemma that captures correlations across environment dynamics. These contributions enable us to derive a sublinear regret bound that is governed by kernel complexity, explicitly linking the choice of model structure to learning efficiency.

Paper: https://lnkd.in/eEWJAUWx

Come see our poster #3207 on December 4 in Exhibit Hall C, D, E between 4:30–7:30 p.m. PST. We’d love to chat about theoretical guarantees, principled exploration strategies, and scaling exploration strategies to complex environments!

Thanks to fantastic collaborators for making this work possible: Sattar Vakili, Amanda Prorok, Carl Henrik Ek