Rethinking kullback-leibler divergence in knowledge distillation for large language models T Wu, C Tao, J Wang, R Yang, Z Zhao, N Wong arXiv preprint arXiv:2404.02657, 2024 | 14 | 2024 |
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models R Yang, T Wu, J Wang, P Hu, N Wong, Y Yang arXiv preprint arXiv:2411.06839, 2024 | 1 | 2024 |
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning Z Li, Y Su, R Yang, Z Xie, N Wong, H Yang arXiv preprint arXiv:2501.03035, 2025 | | 2025 |
LoCa: Logit Calibration for Knowledge Distillation R Yang, T Wu, Y Yang arXiv preprint arXiv:2409.04778, 2024 | | 2024 |