B2 R5209
Reinforcement Learning machine learning combinatorial multi-armed bandits large action spaces limited feedback efficient exploration submodular optimization black-box optimization global optimization