Prati
Paul Christiano
Naslov
Citirano
Citirano
Godina
Training language models to follow instructions with human feedback
L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ...
Advances in neural information processing systems 35, 27730-27744, 2022
107382022
Deep reinforcement learning from human preferences
PF Christiano, J Leike, T Brown, M Martic, S Legg, D Amodei
Advances in neural information processing systems 30, 2017
31502017
Concrete problems in AI safety
D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané
arXiv preprint arXiv:1606.06565, 2016
30172016
Learning to summarize with human feedback
N Stiennon, L Ouyang, J Wu, D Ziegler, R Lowe, C Voss, A Radford, ...
Advances in Neural Information Processing Systems 33, 3008-3021, 2020
17272020
Fine-tuning language models from human preferences
DM Ziegler, N Stiennon, J Wu, TB Brown, A Radford, D Amodei, ...
arXiv preprint arXiv:1909.08593, 2019
13942019
Theano: A Python framework for fast computation of mathematical expressions
R Al-Rfou, G Alain, A Almahairi, C Angermueller, D Bahdanau, N Ballas, ...
arXiv e-prints, arXiv: 1605.02688, 2016
1152*2016
A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models
C Finn, P Christiano, P Abbeel, S Levine
arXiv preprint arXiv:1611.03852, 2016
4332016
Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs
P Christiano, JA Kelner, A Madry, DA Spielman, SH Teng
Proceedings of the forty-third annual ACM symposium on Theory of computing …, 2011
4242011
Transfer from simulation to real world through learning deep inverse dynamics model
P Christiano, Z Shah, I Mordatch, J Schneider, T Blackwell, J Tobin, ...
arXiv preprint arXiv:1610.03518, 2016
2702016
Recursively summarizing books with human feedback
J Wu, L Ouyang, DM Ziegler, N Stiennon, R Lowe, J Leike, P Christiano
arXiv preprint arXiv:2109.10862, 2021
2552021
Quantum money from hidden subspaces
S Aaronson, P Christiano
Proceedings of the forty-fourth annual ACM symposium on Theory of computing …, 2012
2392012
A cryptographic test of quantumness and certifiable randomness from a single quantum device
Z Brakerski, P Christiano, U Mahadev, U Vazirani, T Vidick
Journal of the ACM (JACM) 68 (5), 1-47, 2021
1842021
AI safety via debate
G Irving, P Christiano, D Amodei
arXiv preprint arXiv:1805.00899, 2018
1722018
Model evaluation for extreme risks
T Shevlane, S Farquhar, B Garfinkel, M Phuong, J Whittlestone, J Leung, ...
arXiv preprint arXiv:2305.15324, 2023
1332023
Unrestricted adversarial examples
TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow
arXiv preprint arXiv:1809.08352, 2018
1122018
Supervising strong learners by amplifying weak experts
P Christiano, B Shlegeris, D Amodei
arXiv preprint arXiv:1810.08575, 2018
932018
Sleeper agents: Training deceptive llms that persist through safety training
E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ...
arXiv preprint arXiv:2401.05566, 2024
522024
Robust cooperation in the prisoner's dilemma: Program equilibrium via provability logic
M Barasz, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, ...
arXiv preprint arXiv:1401.5577, 2014
52*2014
Training language models to follow instructions with human feedback, March 2022
L Ouyang, J Wu, X Jiang, D Almeida, CL Wainwright, P Mishkin, C Zhang, ...
URL http://arxiv. org/abs/2203.02155 92, 0
51
Evaluating language-model agents on realistic autonomous tasks
M Kinniment, LJK Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ...
arXiv preprint arXiv:2312.11671, 2023
312023
Sustav trenutno ne može provesti ovu radnju. Pokušajte ponovo kasnije.
Članci 1–20