Yingying Li, University of Illinois Urbana-Champaign
Authors: Yingying Li, Tianpeng Zhang, Subhro Das, Jeff Shamma, Na Li
2022 AWM Research Symposium
Systems and Control

This paper considers single-trajectory adaptive/online learning for linear quadratic regulator (LQR) with an unknown system and constraints on the states and actions. The major challenges are two-fold: 1) how to ensure safety without restarting the system, and 2) how to mitigate the inherent tension among exploration, exploitation, and safety. To tackle these challenges, we propose a single-trajectory learning-based control algorithm that guarantees safety with high probability. Safety is achieved by robust certainty equivalence and a SafeTransit algorithm. Further, we provide a sublinear regret bound compared with the optimal safe linear policy. By this, we solve an open question in Dean et al. (2019b). When developing the regret bound, we also establish a novel estimation error bound for nonlinear policies, which can be interesting on its own. Lastly, we test our algorithm in numerical experiments.

Back to Search Research Symposium Abstracts