Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 1042 | 2023 |
Qd-rl: Efficient mixing of quality and diversity in reinforcement learning G Cideron, T Pierrot, N Perrin, K Beguir, O Sigaud arXiv preprint arXiv:2006.08505, 28-73, 2020 | 83* | 2020 |
Higher: Improving instruction following with hindsight generation for experience replay G Cideron, M Seurin, F Strub, O Pietquin 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 225-232, 2020 | 53* | 2020 |
Factually consistent summarization via reinforcement learning with textual entailment feedback P Roit, J Ferret, L Shani, R Aharoni, G Cideron, R Dadashi, M Geist, ... arXiv preprint arXiv:2306.00186, 2023 | 44 | 2023 |
Warm: On the benefits of weight averaged reward models A Ramé, N Vieillard, L Hussenot, R Dadashi, G Cideron, O Bachem, ... arXiv preprint arXiv:2401.12187, 2024 | 26 | 2024 |
Musicrl: Aligning music generation to human preferences G Cideron, S Girgin, M Verzetti, D Vincent, M Kastelic, Z Borsos, ... arXiv preprint arXiv:2402.04229, 2024 | 6 | 2024 |
Get back here: Robust imitation by return-to-distribution planning G Cideron, B Tabanpour, S Curi, S Girgin, L Hussenot, G Dulac-Arnold, ... arXiv preprint arXiv:2305.01400, 2023 | 4 | 2023 |
vec2text with round-trip translations G Cideron, S Girgin, A Raichuk, O Pietquin, O Bachem, L Hussenot arXiv preprint arXiv:2209.06792, 2022 | 2 | 2022 |
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning K Wang, R Kidambi, R Sullivan, A Agarwal, C Dann, A Michi, M Gelmi, ... arXiv preprint arXiv:2407.15762, 2024 | | 2024 |
BOND: Aligning LLMs with Best-of-N Distillation PG Sessa, R Dadashi, L Hussenot, J Ferret, N Vieillard, A Ramé, ... arXiv preprint arXiv:2407.14622, 2024 | | 2024 |