This paper considers single-trajectory adaptive/online learning for linear quadratic regulator (LQR) with an unknown system and constraints on the states and actions. The major challenges are two-fold: 1) how to ensure safety without restarting the system, and 2) how to mitigate the inherent tension among exploration, exploitation, and safety. To tackle these challenges, we propose a single-trajectory learning-based control algorithm that guarantees safety with high probability. Safety is achieved by robust certainty equivalence and a SafeTransit algorithm. Further, we provide a sublinear regret bound compared with the optimal safe linear policy. By this, we solve an open question in Dean et al. (2019b). When developing the regret bound, we also establish a novel estimation error bound for nonlinear policies, which can be interesting on its own. Lastly, we test our algorithm in numerical experiments.
Safe Adaptive Learning for Linear Quadratic Regulators with Constraints*
Yingying Li, University of Illinois Urbana-ChampaignAuthors: Yingying Li, Tianpeng Zhang, Subhro Das, Jeff Shamma, Na Li
2022 AWM Research Symposium
Systems and Control