Dynamic fusion with intra-and inter-modality attention flow for visual question answering P Gao, Z Jiang, H You, P Lu, SCH Hoi, X Wang, H Li Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019 | 252* | 2019 |
Fast convergence of detr with spatially modulated co-attention P Gao, M Zheng, X Wang, J Dai, H Li Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021 | 74 | 2021 |
End-to-end object detection with adaptive clustering transformer M Zheng, P Gao, R Zhang, K Li, X Wang, H Li, H Dong arXiv preprint arXiv:2011.09315, 2020 | 74 | 2020 |
Question-guided hybrid convolution for visual question answering P Gao, H Li, S Li, P Lu, Y Li, SCH Hoi, X Wang Proceedings of the European Conference on Computer Vision (ECCV), 469-485, 2018 | 73 | 2018 |
Multi-modality latent interaction network for visual question answering P Gao, H You, Z Zhang, X Wang, H Li Proceedings of the IEEE/CVF international conference on computer vision …, 2019 | 64* | 2019 |
Clip-adapter: Better vision-language models with feature adapters P Gao, S Geng, R Zhang, T Ma, R Fang, Y Zhang, H Li, Y Qiao arXiv preprint arXiv:2110.04544, 2021 | 33 | 2021 |
Uniformer: Unified transformer for efficient spatiotemporal representation learning K Li, Y Wang, P Gao, G Song, Y Liu, H Li, Y Qiao arXiv preprint arXiv:2201.04676, 2022 | 31* | 2022 |
Container: Context aggregation network P Gao, J Lu, H Li, R Mottaghi, A Kembhavi arXiv preprint arXiv:2106.01401, 2021 | 30 | 2021 |
Video object detection with locally-weighted deformable neighbors Z Jiang, P Gao, C Guo, Q Zhang, S Xiang, C Pan Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), 8529-8536, 2019 | 30 | 2019 |
Dynamic graph representation learning for video dialog via multi-modal shuffled transformers S Geng, P Gao, M Chatterjee, C Hori, J Le Roux, Y Zhang, H Li, A Cherian Proceedings of the AAAI Conference on Artificial Intelligence 35 (2), 1415-1423, 2021 | 26* | 2021 |
Learning where to focus for efficient video object detection Z Jiang, Y Liu, C Yang, J Liu, P Gao, Q Zhang, S Xiang, C Pan European conference on computer vision, 18-34, 2020 | 26 | 2020 |
Tip-adapter: Training-free clip-adapter for better vision-language modeling R Zhang, R Fang, P Gao, W Zhang, K Li, J Dai, Y Qiao, H Li arXiv preprint arXiv:2111.03930, 2021 | 19 | 2021 |
Dual-stream network for visual recognition M Mao, R Zhang, H Zheng, T Ma, Y Peng, E Ding, B Zhang, S Han Advances in Neural Information Processing Systems 34, 25346-25358, 2021 | 18 | 2021 |
Contrastive visual-linguistic pretraining L Shi, K Shuang, S Geng, P Su, Z Jiang, P Gao, Z Fu, G de Melo, S Su arXiv preprint arXiv:2007.13135, 2020 | 18 | 2020 |
Character matters: Video story understanding with character-aware relations S Geng, J Zhang, Z Fu, P Gao, H Zhang, G de Melo arXiv preprint arXiv:2005.08646, 2020 | 10 | 2020 |
Monodetr: Depth-aware transformer for monocular 3d object detection R Zhang, H Qiu, T Wang, X Xu, Z Guo, Y Qiao, P Gao, H Li arXiv preprint arXiv:2203.13310, 2022 | 9 | 2022 |
Pointclip: Point cloud understanding by clip R Zhang, Z Guo, W Zhang, K Li, X Miao, B Cui, Y Qiao, P Gao, H Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 9 | 2022 |
Oriented object detection with transformer T Ma, M Mao, H Zheng, P Gao, X Wang, S Han, E Ding, B Zhang, ... arXiv preprint arXiv:2106.03146, 2021 | 7 | 2021 |
Romebert: Robust training of multi-exit bert S Geng, P Gao, Z Fu, Y Zhang arXiv preprint arXiv:2101.09755, 2021 | 6 | 2021 |
Multi-layer content interaction through quaternion product for visual question answering L Shi, S Geng, K Shuang, C Hori, S Liu, P Gao, S Su ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 6 | 2020 |