Optimal Control and Reinforcement Learning for Partially Observable Dynamical Systems

Keywords
Loading...
Thumbnail Image
Issue Date
2023-06-01
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Dynamical systems can be found everywhere around us. They range from complex systems such as the weather to smaller systems like a bead accelerating towards the earth. How agents or controllers can influence these systems to best suit their interests is a central question in both Artificial Intelligence and Control Theory. This problem is of interest to research, industry, and society. We observe the ability to control the internal or external state at every level of life. Insights into these control mechanisms can help us gain a better understanding of natural intelligence, constituting a scientific interest. Furthermore, many processes can be described as dynamical systems. Optimizing controllers can benefit industrial interest in making processes more cost and time efficient. In society, controllers already play an enormous role. Think about your thermostat, washing machine, ’smart’ traffic lights, or cruise control. Research into control can make our world safer (driver assistance), more reliable (energy grid controllers), and more sustainable (automated building management). While this field has been extensively studied for a long time, since the 1960s there has been a great open problem. How can we optimally both regulate a system and learn its dynamics? As real-world systems are often only partially observable, we want controllers that can learn from their observations. Where some fully observable systems have known optimal solutions, this problem, known as dual control, is known to be intractable. This work aims to investigate how both Control Theory and Reinforcement Learning can solve the dual control problem. We show that an existing method for solving the dual control problem can be extended to the two-dimensional case. Furthermore, we provide both a recurrent and a regular Soft Actor-Critic agent implemented in the JAX ecosystem. We show that the Recurrent Soft Actor-Critic provides more reliable and effective control to partially observable dynamical systems.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen