Control-a-video: Controllable text-to-video generation with diffusion models W Chen*, Y Ji*, J Wu, H Wu, P Xie, J Li, X Xia, X Xiao, L Lin arXiv preprint arXiv:2305.13840, 2023 | 103 | 2023 |
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model Y Ji*, J Wang*, Y Gong, L Zhang, Y Zhu, H Wang, J Zhang, T Sakai, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 34 | 2023 |
Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection Y Xiao, Z Luo, Y Liu, Y Ma, H Bian, Y Ji, Y Yang, X Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 22 | 2024 |
Mirtt: Learning multimodal interaction representations from trilinear transformers for visual question answering J Wang*, Y Ji*, J Sun, Y Yang, T Sakai Findings of the Association for Computational Linguistics: EMNLP 2021, 2280-2292, 2021 | 18 | 2021 |
Multimodal prototype-enhanced network for few-shot action recognition X Ni, Y Liu, H Wen, Y Ji, J Xiao, Y Yang Proceedings of the 2024 International Conference on Multimedia Retrieval, 1-10, 2024 | 14 | 2024 |
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning Y Ji*, R Tu*, J Jiang, W Kong, C Cai, W Zhao, H Wang, Y Yang, W Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 10 | 2023 |
3D face reconstruction system from a single photo based on regression neural network Y Ji, K Li, H Wu, G Xiong, Z Shen, X Shang, B Xi IFAC-PapersOnLine 53 (5), 71-76, 2020 | 2 | 2020 |
Taming Lookup Tables for Efficient Image Retouching S Yang, B Huang, M Cao, Y Ji, H Guo, N Wong, Y Yang European Conference on Computer Vision, 144-159, 2025 | 1 | 2025 |
Ida-vlm: Towards movie understanding via id-aware large vision-language model Y Ji, S Zhang, J Wu, P Sun, W Chen, X Xiao, S Yang, Y Yang, P Luo arXiv preprint arXiv:2407.07577, 2024 | 1 | 2024 |
Global and Local Semantic Completion Learning for Vision-Language Pre-training RC Tu*, Y Ji*, J Jiang, W Kong, C Cai, W Zhao, H Wang, Y Yang, W Liu arXiv preprint arXiv:2306.07096, 2023 | 1 | 2023 |
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents J Wang, Y Zhang, Y Ji, Y Zhang, C Jiang, Y Wang, K Zhu, Z Wang, ... arXiv preprint arXiv:2406.13923, 2024 | | 2024 |
Similarity Transitivity Broken-Aware Multi-Modal Hashing RC Tu, XL Mao, J Liu, Y Ji, W Wei, H Huang IEEE Transactions on Knowledge and Data Engineering, 2024 | | 2024 |
Modeling Multimodal Uncertainties via Probability Distribution Encoders Included Vision-Language Models J Wang, Y Ji, Y Zhang, Y Zhu, T Sakai IEEE Access, 2023 | | 2023 |