Rémi Munos
Rémi Munos
DeepMind
Adresse e-mail validée de inria.fr - Page d'accueil
Titre
Citée par
Citée par
Année
Unifying count-based exploration and intrinsic motivation
MG Bellemare, S Srinivasan, G Ostrovski, T Schaul, D Saxton, R Munos
arXiv preprint arXiv:1606.01868, 2016
7172016
A distributional perspective on reinforcement learning
MG Bellemare, W Dabney, R Munos
International Conference on Machine Learning, 449-458, 2017
5902017
Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures
L Espeholt, H Soyer, R Munos, K Simonyan, V Mnih, T Ward, Y Doron, ...
International Conference on Machine Learning, 1407-1416, 2018
5732018
X-Armed Bandits.
S Bubeck, R Munos, G Stoltz, C Szepesvári
Journal of Machine Learning Research 12 (5), 2011
550*2011
Best arm identification in multi-armed bandits.
JY Audibert, S Bubeck, R Munos
COLT, 41-53, 2010
5372010
Exploration–exploitation tradeoff using variance estimates in multi-armed bandits
JY Audibert, R Munos, C Szepesvári
Theoretical Computer Science 410 (19), 1876-1902, 2009
5312009
Thompson sampling: An asymptotically optimal finite-time analysis
E Kaufmann, N Korda, R Munos
International conference on algorithmic learning theory, 199-213, 2012
5272012
Modification of UCT with patterns in Monte-Carlo Go
S Gelly, Y Wang, R Munos, O Teytaud
INRIA, 2006
4942006
Sample efficient actor-critic with experience replay
Z Wang, V Bapst, N Heess, V Mnih, R Munos, K Kavukcuoglu, ...
arXiv preprint arXiv:1611.01224, 2016
4832016
Learning to reinforcement learn
JX Wang, Z Kurth-Nelson, D Tirumala, H Soyer, JZ Leibo, R Munos, ...
arXiv preprint arXiv:1611.05763, 2016
4342016
Noisy networks for exploration
M Fortunato, MG Azar, B Piot, J Menick, I Osband, A Graves, V Mnih, ...
arXiv preprint arXiv:1706.10295, 2017
4092017
Safe and efficient off-policy reinforcement learning
R Munos, T Stepleton, A Harutyunyan, MG Bellemare
arXiv preprint arXiv:1606.02647, 2016
3712016
Pure exploration in multi-armed bandits problems
S Bubeck, R Munos, G Stoltz
International conference on Algorithmic learning theory, 23-37, 2009
3702009
Variable resolution discretization in optimal control
R Munos, A Moore
Machine learning 49 (2), 291-323, 2002
3582002
Finite-Time Bounds for Fitted Value Iteration.
R Munos, C Szepesvári
Journal of Machine Learning Research 9 (5), 2008
3252008
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
A Antos, C Szepesvári, R Munos
Machine Learning 71 (1), 89-129, 2008
3202008
Kullback–leibler upper confidence bounds for optimal sequential allocation
O Cappé, A Garivier, OA Maillard, R Munos, G Stoltz
Annals of Statistics 41 (3), 1516-1541, 2013
3012013
Count-based exploration with neural density models
G Ostrovski, MG Bellemare, A Oord, R Munos
International conference on machine learning, 2721-2730, 2017
2892017
Minimax regret bounds for reinforcement learning
MG Azar, I Osband, R Munos
International Conference on Machine Learning, 263-272, 2017
2642017
Successor features for transfer in reinforcement learning
A Barreto, W Dabney, R Munos, JJ Hunt, T Schaul, H Van Hasselt, ...
arXiv preprint arXiv:1606.05312, 2016
2542016
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–20