Kullback–leibler upper confidence bounds for optimal sequential allocation O Cappé, A Garivier, OA Maillard, R Munos, G Stoltz Annals of Statistics 41 (3), 1516-1541, 2013 | 287 | 2013 |
A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences OA Maillard, R Munos, G Stoltz Proceedings of the 24th annual Conference On Learning Theory, 497-514, 2011 | 117 | 2011 |
Compressed least-squares regression OA Maillard, R Munos | 117 | 2009 |
Concentration inequalities for sampling without replacement R Bardenet, OA Maillard Bernoulli 21 (3), 1361-1385, 2015 | 106 | 2015 |
LSTD with random projections M Ghavamzadeh, A Lazaric, OA Maillard, R Munos | 62 | 2010 |
Latent Bandits. OA Maillard, S Mannor International Conference on Machine Learning, 136-144, 2014 | 57 | 2014 |
Linear regression with random projections O Maillard, R Munos Journal of Machine Learning Research 13, 2735-2772, 2012 | 44 | 2012 |
Finite-sample analysis of Bellman residual minimization OA Maillard, R Munos, A Lazaric, M Ghavamzadeh Proceedings of 2nd Asian Conference on Machine Learning, 299-314, 2010 | 40 | 2010 |
Robust risk-averse stochastic multi-armed bandits OA Maillard International Conference on Algorithmic Learning Theory, 218-233, 2013 | 38 | 2013 |
Selecting the state-representation in reinforcement learning OA Maillard, R Munos, D Ryabko arXiv preprint arXiv:1302.2552, 2013 | 36 | 2013 |
Sub-sampling for multi-armed bandits A Baransi, OA Maillard, S Mannor Joint European Conference on Machine Learning and Knowledge Discovery in …, 2014 | 34 | 2014 |
The non-stationary stochastic multi-armed bandit problem R Allesiardo, R Féraud, OA Maillard International Journal of Data Science and Analytics 3 (4), 267-283, 2017 | 33 | 2017 |
How hard is my MDP?" The distribution-norm to the rescue" OA Maillard, TA Mann, S Mannor Advances in Neural Information Processing Systems 27, 1835-1843, 2014 | 33 | 2014 |
Online learning in adversarial lipschitz environments OA Maillard, R Munos Joint european conference on machine learning and knowledge discovery in …, 2010 | 29 | 2010 |
Adaptive Bandits: Towards the best history-dependent strategy OA Maillard, R Munos | 28* | 2011 |
Hybrid collaborative filtering with autoencoders F Strub, J Mary, R Gaudel arXiv preprint arXiv:1603.00806, 2016 | 27 | 2016 |
Optimal regret bounds for selecting the state representation in reinforcement learning OA Maillard, P Nguyen, R Ortner, D Ryabko International Conference on Machine Learning, 543-551, 2013 | 26 | 2013 |
Variance-aware regret bounds for undiscounted reinforcement learning in mdps MS Talebi, OA Maillard Algorithmic Learning Theory, 770-805, 2018 | 24 | 2018 |
Selecting near-optimal approximate state representations in reinforcement learning R Ortner, OA Maillard, D Ryabko International Conference on Algorithmic Learning Theory, 140-154, 2014 | 22 | 2014 |
Streaming kernel regression with provably adaptive mean, variance, and regularization A Durand, OA Maillard, J Pineau The Journal of Machine Learning Research 19 (1), 650-683, 2018 | 18 | 2018 |