Multi-agent higher-order learning vs Nash equilibrium
- Jeff Shamma
B5 L5 R5209
The framework of multi-agent learning explores the dynamics of how individual agent strategies evolve in response to the evolving strategies of other agents. Of particular interest is whether or not agent strategies converge to well-known solution concepts such as Nash Equilibrium (NE).
Overview
Abstract
The framework of multi-agent learning explores the dynamics of how individual agent strategies evolve in response to the evolving strategies of other agents. Of particular interest is whether or not agent strategies converge to well-known solution concepts such as Nash Equilibrium (NE). Most “fixed order” learning dynamics restrict an agent’s underlying state to be its own strategy. In “higher order” learning, agent dynamics can include auxiliary states that can capture phenomena such as path dependencies. We introduce higher-order gradient play dynamics that resemble projected gradient ascent with auxiliary states. The dynamics are “payoff based” in that each agent's dynamics depend on its own evolving payoff. While these payoffs depend on the strategies of other agents in a game setting, agent dynamics do not depend explicitly on the nature of the game or the strategies of other agents. In this sense, dynamics are “uncoupled” since an agent’s dynamics do not depend explicitly on the utility functions of other agents. We first show that for any specific game with an isolated completely mixed-strategy NE, there exist higher-order gradient play dynamics that lead (locally) to that NE, both for the specific game and nearby games with perturbed utility functions. Conversely, we show that for any higher-order gradient play dynamics, there exists a game with a unique isolated completely mixed-strategy NE for which the dynamics do not lead to NE. These results build on prior work that showed that uncoupled fixed-order learning cannot lead to NE in certain instances, whereas higher-order variants can. Finally, we consider the mixed-strategy equilibrium associated with coordination games. While higher-order gradient play can converge to such equilibria, we show such dynamics must be inherently irrational.
Brief Biography
Jeff S. Shamma is with the University of Illinois at Urbana-Champaign where he is the Department Head of Industrial and Enterprise Systems Engineering (ISE) and Jerry S. Dobrovolny Chair. His prior academic appointments include faculty positions at the King Abdullah University of Science and Technology (KAUST) and the Georgia Institute of Technology, where he was the Julian T. Hightower Chair in Systems and Controls. Jeff received a PhD in Systems Science and Engineering from MIT in 1988. He is a Fellow of IEEE and IFAC; a recipient of the IFAC High Impact Paper Award, AACC Donald P. Eckman Award, and NSF Young Investigator Award; and a past Distinguished Lecturer of the IEEE Control Systems Society. He has been a plenary or semi-plenary speaker at several conferences, including NeurIPS, World Congress of the Game Theory Society, IEEE Conference on Decision and Control, and the American Control Conference. Jeff is currently serving as the Editor-in-Chief for the IEEE Transactions on Control of Network Systems.