PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang, X Jiang, ZZ Yang, K Wang, ... arXiv preprint arXiv:2104.12369, 2021 | 85 | 2021 |
Nezha: Neural contextualized representation for chinese language understanding J Wei, X Ren, X Li, W Huang, Y Liao, Y Wang, J Lin, X Jiang, X Chen, ... arXiv preprint arXiv:1909.00204, 2019 | 65 | 2019 |
Sparsebert: Rethinking the importance analysis in self-attention H Shi, J Gao, X Ren, H Xu, X Liang, Z Li, JTY Kwok International Conference on Machine Learning, 9547-9557, 2021 | 24 | 2021 |
Autobert-zero: Evolving bert backbone from scratch J Gao, H Xu, H Shi, X Ren, LH Philip, X Liang, X Jiang, Z Li Proceedings of the AAAI Conference on Artificial Intelligence 36 (10), 10663 …, 2022 | 15 | 2022 |
Large-scale deep learning optimizations: A comprehensive survey X He, F Xue, X Ren, Y You arXiv preprint arXiv:2111.00856, 2021 | 8 | 2021 |
Efficientbert: Progressively searching multilayer perceptron via warm-up knowledge distillation C Dong, G Wang, H Xu, J Peng, X Ren, X Liang arXiv preprint arXiv:2109.07222, 2021 | 8 | 2021 |
NumGPT: Improving numeracy ability of generative pre-trained models Z Jin, X Jiang, X Wang, Q Liu, Y Wang, X Ren, H Qu arXiv preprint arXiv:2109.03137, 2021 | 8 | 2021 |
Deeper vs wider: A revisit of transformer configuration F Xue, J Chen, A Sun, X Ren, Z Zheng, X He, X Jiang, Y You arXiv preprint arXiv:2205.10505, 2022 | 4 | 2022 |
One Student Knows All Experts Know: From Sparse to Dense F Xue, X He, X Ren, Y Lou, Y You arXiv preprint arXiv:2201.10890, 2022 | 3 | 2022 |
Rethinking the importance analysis in self-attention H Shi, J Gao, X Ren, H Xu, X Liang, Z Li, TY Kwok | 1 | 2021 |
PanGu-: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing X Ren, P Zhou, X Meng, X Huang, Y Wang, W Wang, P Li, X Zhang, ... arXiv preprint arXiv:2303.10845, 2023 | | 2023 |