Prati
Philip Thomas
Naslov
Citirano
Citirano
Godina
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill
International Conference on Machine Learning, 2139-2148, 2016
7192016
Value function approximation in reinforcement learning using the Fourier basis
G Konidaris, S Osentoski, P Thomas
Proceedings of the AAAI Conference on Artificial Intelligence 25 (1), 380-385, 2011
5592011
High-confidence off-policy evaluation
P Thomas, G Theocharous, M Ghavamzadeh
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
2962015
High confidence policy improvement
P Thomas, G Theocharous, M Ghavamzadeh
International Conference on Machine Learning, 2380-2388, 2015
2082015
Ad recommendation systems for life-time value optimization
G Theocharous, PS Thomas, M Ghavamzadeh
Proceedings of the 24th international conference on world wide web, 1305-1310, 2015
1882015
Learning action representations for reinforcement learning
Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas
International conference on machine learning, 941-950, 2019
1822019
Preventing undesirable behavior of intelligent machines
P Thomas, B Castro da Silva, A Barto, S Giguere, Y Brun, E Brunskill
Science 366 (6468), 999-1004, 2019
1802019
Increasing the action gap: New operators for reinforcement learning
MG Bellemare, G Ostrovski, A Guez, P Thomas, R Munos
Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016
1662016
Bias in natural actor-critic algorithms
P Thomas
International conference on machine learning, 441-448, 2014
1522014
Safe reinforcement learning
PS Thomas
1142015
Is the policy gradient a gradient?
C Nota, PS Thomas
arXiv preprint arXiv:1906.07073, 2019
672019
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards
KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch
IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017
642017
Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing
P Thomas, G Theocharous, M Ghavamzadeh, I Durugkar, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 31 (2), 4740-4745, 2017
642017
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces
S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ...
arXiv preprint arXiv:1405.6757, 2014
632014
Optimizing for the future in non-stationary mdps
Y Chandak, G Theocharous, S Shankar, M White, S Mahadevan, ...
International Conference on Machine Learning, 1414-1425, 2020
622020
Evaluating the performance of reinforcement learning algorithms
S Jordan, Y Chandak, D Cohen, M Zhang, P Thomas
International Conference on Machine Learning, 4962-4973, 2020
572020
Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines
PS Thomas, E Brunskill
arXiv preprint arXiv:1706.06643, 2017
562017
Risk Quantification for Policy Deployment
PS Thomas, G Theocharous, M Ghavamzadeh
US Patent App. 14/552,047, 2016
502016
Some recent applications of reinforcement learning
AG Barto, PS Thomas, RS Sutton
Proceedings of the Eighteenth Yale Workshop on Adaptive and Learning Systems, 2017
492017
Universal off-policy evaluation
Y Chandak, S Niekum, B da Silva, E Learned-Miller, E Brunskill, ...
Advances in Neural Information Processing Systems 34, 27475-27490, 2021
482021
Sustav trenutno ne može provesti ovu radnju. Pokušajte ponovo kasnije.
Članci 1–20