Follow
Yunhao Tang
Yunhao Tang
Research Scientist, DeepMind
Verified email at columbia.edu - Homepage
Title
Cited by
Cited by
Year
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
5922023
Reinforcement learning for integer programming: Learning to cut
Y Tang, S Agrawal, Y Faenza
International conference on machine learning, 9367-9376, 2020
1932020
Es-maml: Simple hessian-free meta learning
X Song, W Gao, Y Yang, K Choromanski, A Pacchiano, Y Tang
arXiv preprint arXiv:1910.01215, 2019
1282019
Discretizing continuous action space for on-policy optimization
Y Tang, S Agrawal
Proceedings of the aaai conference on artificial intelligence 34 (04), 5981-5988, 2020
1172020
Monte-Carlo tree search as regularized policy optimization
JB Grill, F Altché, Y Tang, T Hubert, M Valko, I Antonoglou, R Munos
International Conference on Machine Learning, 3769-3778, 2020
692020
Byol-explore: Exploration by bootstrapped prediction
Z Guo, S Thakoor, M Pîslar, B Avila Pires, F Altché, C Tallec, A Saade, ...
Advances in neural information processing systems 35, 31855-31870, 2022
542022
From complexity to simplicity: Adaptive es-active subspaces for blackbox optimization
KM Choromanski, A Pacchiano, J Parker-Holder, Y Tang, V Sindhwani
Advances in Neural Information Processing Systems 32, 2019
492019
Orthogonal estimation of Wasserstein distances
M Rowland, J Hron, Y Tang, K Choromanski, T Sarlos, A Weller
The 22nd International Conference on Artificial Intelligence and Statistics …, 2019
462019
Provably robust blackbox optimization for reinforcement learning
K Choromanski, A Pacchiano, J Parker-Holder, Y Tang, D Jain, Y Yang, ...
CoRR, abs/1903.02993, 2019
422019
Exploration by distributional reinforcement learning
Y Tang, S Agrawal
arXiv preprint arXiv:1805.01907, 2018
402018
Learning to Score Behaviors for Guided Policy Optimization
A Pacchiano, J Parker-Holder, Y Tang, A Choromanska, K Choromanski, ...
arXiv preprint arXiv:1906.04349, 2019
392019
Boosting trust region policy optimization by normalizing flows policy
Y Tang, S Agrawal
arXiv preprint arXiv:1809.10326, 2018
332018
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ...
arXiv preprint arXiv:2403.05530, 2024
252024
Self-imitation learning via generalized lower bound q-learning
Y Tang
Advances in neural information processing systems 33, 13964-13975, 2020
232020
Understanding self-predictive learning for reinforcement learning
Y Tang, ZD Guo, PH Richemond, BA Pires, Y Chandak, R Munos, ...
International Conference on Machine Learning, 33632-33656, 2023
212023
Hindsight expectation maximization for goal-conditioned reinforcement learning
Y Tang, A Kucukelbir
International Conference on Artificial Intelligence and Statistics, 2863-2871, 2021
202021
Revisiting Peng’s Q() for Modern Reinforcement Learning
T Kozuno, Y Tang, M Rowland, R Munos, S Kapturowski, W Dabney, ...
International Conference on Machine Learning, 5794-5804, 2021
182021
Taylor expansion policy optimization
Y Tang, M Valko, R Munos
International Conference on Machine Learning, 9397-9406, 2020
172020
Nash learning from human feedback
R Munos, M Valko, D Calandriello, MG Azar, M Rowland, ZD Guo, Y Tang, ...
arXiv preprint arXiv:2312.00886, 2023
162023
An analysis of quantile temporal-difference learning
M Rowland, R Munos, MG Azar, Y Tang, G Ostrovski, A Harutyunyan, ...
arXiv preprint arXiv:2301.04462, 2023
152023
The system can't perform the operation now. Try again later.
Articles 1–20