Follow
Thomas William Anthony
Thomas William Anthony
Google DeepMind
Verified email at google.com
Title
Cited by
Cited by
Year
Thinking fast and slow with deep learning and tree search
TW Anthony, Z Tian, D Barber
Advances in Neural Information Processing Systems, 5360-5370, 2017
4292017
Openspiel: A framework for reinforcement learning in games
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint arXiv:1908.09453, 2019
2802019
Mastering the game of Stratego with model-free multiagent reinforcement learning
J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub, V de Boer, ...
Science 378 (6623), 990-996, 2022
2202022
From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization
J Perolat, R Munos, JB Lespiau, S Omidshafiei, M Rowland, P Ortega, ...
International Conference on Machine Learning, 8525-8535, 2021
922021
On the role of planning in model-based deep reinforcement learning
JB Hamrick, AL Friesen, F Behbahani, A Guez, F Viola, S Witherspoon, ...
arXiv preprint arXiv:2011.04021, 2020
882020
Learning to Play No-Press Diplomacy with Best Response Policy Iteration
T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, TC Hudson, N Porcel, ...
arXiv preprint arXiv:2006.04635, 2020
572020
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
TW Anthony, R Nishihara, P Moritz, T Salimans, J Schulman
arXiv preprint arXiv:1904.03646, 2019
322019
OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019)
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint cs.LG/1908.09453, 2019
282019
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games
E Hughes, TW Anthony, T Eccles, JZ Leibo, D Balduzzi, Y Bachrach
arXiv preprint arXiv:2003.00799, 2020
252020
ITERATIVE EMPIRICAL GAME SOLVING VIA SINGLE POLICY BEST RESPONSE
MO Smith, T Anthony, MP Wellman
20*
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent
I Gemp, R Savani, M Lanctot, Y Bachrach, T Anthony, R Everett, ...
arXiv preprint arXiv:2106.01285, 2021
192021
Smooth markets: A basic mechanism for organizing gradient-based learners
D Balduzzi, WM Czarnecki, TW Anthony, IM Gemp, E Hughes, JZ Leibo, ...
arXiv preprint arXiv:2001.04678, 2020
172020
Learning to play against any mixture of opponents
MO Smith, T Anthony, MP Wellman
Frontiers in Artificial Intelligence 6, 2023
152023
Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers
L Marris, I Gemp, T Anthony, A Tacchetti, S Liu, K Tuyls
Advances in Neural Information Processing Systems 35, 5586-5600, 2022
152022
Expert iteration
TW Anthony
UCL (University College London), 2021
82021
Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas
U Madhushani, KR McKee, JP Agapiou, JZ Leibo, R Everett, T Anthony, ...
arXiv preprint arXiv:2305.00768, 2023
62023
Designing all-pay auctions using deep learning and multi-agent simulation
I Gemp, T Anthony, J Kramar, T Eccles, A Tacchetti, Y Bachrach
Scientific Reports 12 (1), 16937, 2022
62022
Developing, evaluating and scaling learning agents in multi-agent environments
I Gemp, T Anthony, Y Bachrach, A Bhoopchand, K Bullard, J Connor, ...
AI Communications 35 (4), 271-284, 2022
52022
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
M Lanctot, J Schultz, N Burch, MO Smith, D Hennes, T Anthony, J Perolat
arXiv preprint arXiv:2303.03196, 2023
42023
Strategic Knowledge Transfer
MO Smith, T Anthony, MP Wellman
Journal of Machine Learning Research 24 (233), 1-96, 2023
32023
The system can't perform the operation now. Try again later.
Articles 1–20