Prati
Banghua Zhu
Naslov
Citirano
Citirano
Godina
Bridging offline reinforcement learning and imitation learning: A tale of pessimism
P Rashidinejad, B Zhu, C Ma, J Jiao, S Russell
Advances in Neural Information Processing Systems 34, 11702-11716, 2021
3232021
Chatbot arena: An open platform for evaluating llms by human preference
WL Chiang, L Zheng, Y Sheng, AN Angelopoulos, T Li, D Li, H Zhang, ...
arXiv preprint arXiv:2403.04132, 2024
2612024
Principled reinforcement learning with human feedback from pairwise or k-wise comparisons
B Zhu, M Jordan, J Jiao
International Conference on Machine Learning, 43037-43067, 2023
1592023
Jump-start reinforcement learning
I Uchendu, T Xiao, Y Lu, B Zhu, M Yan, J Simon, M Bennice, C Fu, C Ma, ...
International Conference on Machine Learning, 34556-34583, 2023
1172023
Joint transceiver optimization for wireless communication PHY using neural network
B Zhu, J Wang, L He, J Song
IEEE Journal on Selected Areas in Communications 37 (6), 1364-1373, 2019
1142019
Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF
B Zhu, E Frick, T Wu, H Zhu, J Jiao
https://starling.cs.berkeley.edu/, 2023
752023
S-lora: Serving thousands of concurrent lora adapters
Y Sheng, S Cao, D Li, C Hooper, N Lee, S Yang, C Chou, B Zhu, L Zheng, ...
arXiv preprint arXiv:2311.03285, 2023
662023
Deconstructing Generative Adversarial Networks
B Zhu, J Jiao, D Tse
arXiv preprint arXiv:1901.09465, 2019
582019
Generalized resilience and robust statistics
B Zhu, J Jiao, J Steinhardt
The Annals of Statistics 50 (4), 2256-2283, 2022
552022
The sample complexity of online contract design
B Zhu, S Bates, Z Yang, Y Wang, J Jiao, MI Jordan
arXiv preprint arXiv:2211.05732, 2022
522022
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu, JE Gonzalez, I Stoica
arXiv preprint arXiv:2406.11939, 2024
482024
Robust estimation via generalized quasi-gradients
B Zhu, J Jiao, J Steinhardt
Information and Inference: A Journal of the IMA 11 (2), 581-636, 2022
472022
Fine-tuning language models with advantage-induced policy alignment
B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu, MI Jordan, J Jiao
arXiv preprint arXiv:2306.02231, 2023
352023
Byzantine-robust federated learning with optimal statistical rates
B Zhu, L Wang, Q Pang, S Wang, J Jiao, D Song, MI Jordan
International Conference on Artificial Intelligence and Statistics, 3151-3178, 2023
35*2023
Fairness in serving large language models
Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo, JE Gonzalez, I Stoica
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
342024
Pairwise proximal policy optimization: Harnessing relative feedback for llm alignment
T Wu, B Zhu, R Zhang, Z Wen, K Ramchandran, J Jiao
arXiv preprint arXiv:2310.00212, 2023
332023
Sparse tensor decomposition for haplotype assembly of diploids and polyploids
A Hashemi, B Zhu, H Vikalo
BMC genomics 19, 1-15, 2018
272018
From live data to high-quality benchmarks: The arena-hard pipeline
T Li, WL Chiang, E Frick, L Dunlap, B Zhu, JE Gonzalez, I Stoica
April, 2024
212024
When does the Tukey median work?
B Zhu, J Jiao, J Steinhardt
2020 IEEE International Symposium on Information Theory (ISIT), 1201-1206, 2020
212020
Nexusraven: a commercially-permissive language model for function calling
VK Srinivasan, Z Dong, B Zhu, B Yu, D Mosk-Aoyama, K Keutzer, J Jiao, ...
NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023
202023
Sustav trenutno ne može provesti ovu radnju. Pokušajte ponovo kasnije.
Članci 1–20