Reinforcement Learning and Optimization in Large Action Spaces under Limited Feedback

Fares Fourati, Ph.D. Student, Electrical and Computer Engineering

Supervisors:

Mohamed-Slim Alouini, Al-Khawarzmi Distinguished Professor, Electrical and Computer Engineering

Apr 29, 15:00 - 16:45

B2 R5209

Reinforcement Learning machine learning combinatorial multi-armed bandits large action spaces limited feedback efficient exploration submodular optimization black-box optimization global optimization

This dissertation develops theoretical foundations and scalable algorithms for reinforcement learning and optimization in large decision spaces under limited feedback.

Modern AI systems increasingly operate in environments where decision spaces are combinatorial, high-dimensional, or exponentially large, while feedback is sparse, global, or costly to obtain. In such regimes, classical methods become computationally prohibitive, as optimal decisions require searching intractable spaces and each evaluation provides limited information.

We first study combinatorial multi-armed bandits, where the number of actions grows exponentially. By exploiting submodularity in expectation, we introduce randomized greedy and stochastic-greedy algorithms that avoid exhaustive exploration while achieving provable sublinear regret under full-bandit feedback. These ideas extend to an offline-to-online learning framework in both single-agent and federated multi-agent settings, enabling scalable distributed learning with limited communication.

We then address global optimization of black-box reward functions with expensive evaluations and unknown smoothness. We develop adaptive Lipschitz optimization methods that eliminate provably suboptimal regions via consistency-based filtering, significantly improving query efficiency without explicit smoothness estimation.

Finally, we consider value-based reinforcement learning with large discrete action spaces. We propose stochastic Q-learning methods that replace exhaustive maximization with randomized sampling and memory mechanisms, reducing computational complexity from linear to logarithmic in the action space while preserving convergence guarantees.

Across these settings, a unifying principle emerges: scalability in large decision spaces arises from combining structural assumptions, randomized approximation, and uncertainty-aware exploration.

Presenters

Fares Fourati, Ph.D. Student, Electrical and Computer Engineering

Brief Biography

Fares Fourati is a Ph.D. candidate in Electrical and Computer Engineering at King Abdullah University of Science and Technology (KAUST). His work has been published in leading AI and machine learning venues, including ICML, AAAI, AISTATS, EMNLP, and ECAI.

He has received several distinctions, including consecutive CEMSE Dean’s List Awards, first place in the ACM-SIAM Student Competition (2024), and first place at the Marconi Society Connectivity Summit (2021). Fares earned his Diplôme d’Ingénieur from École Polytechnique de Tunisie in 2020 and an M.S. in Electrical and Computer Engineering from KAUST in 2022.

In addition to his research, he has been actively involved in teaching and mentoring in AI and machine learning at KAUST and across Saudi Arabia. He is also the author of To What End? Meditations on AI and Humanity, reflecting his broader interest in the philosophical implications of AI.

Reinforcement Learning and Optimization in Large Action Spaces under Limited Feedback

Presenters

Fares Fourati, Ph.D. Student, Electrical and Computer Engineering

Brief Biography

Share

Related Sites

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE)

Connect with us

Reinforcement Learning and Optimization in Large Action Spaces under Limited Feedback

Overview

Presenters

Fares Fourati, Ph.D. Student, Electrical and Computer Engineering

Brief Biography

Related People

Related Researchers

Fares Fourati

Academic Advisors

Mohamed-Slim Alouini

Share

Related Sites