Luowei Zhou

Cited by

	All	Since 2019
Citations	6876	6785
h-index	26	26
i10-index	33	33

2400

1200

600

1800

201820192020202120222023202466 160 340 624 1207 2093 2356

Public access

View all

16 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Jason CorsoProfessor of Robotics, Electrical Engineering and Computer Science, University of MichiganVerified email at umich.edu
Chenliang XuAssociate Professor, University of RochesterVerified email at rochester.edu
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondVerified email at microsoft.com
Zhe GanResearch Scientist, AppleVerified email at apple.com
Jianfeng GaoMicrosoft Research, RedmondVerified email at microsoft.com
Bin XiaoPrincipal Research Manager, Microsoft GenAIVerified email at microsoft.com
Linjie (Lindsey) LiSenior Researcher, MicrosoftVerified email at microsoft.com
Dongdong ChenPrincipal Research Manager, GenAI, MicrosoftVerified email at mail.ustc.edu.cn
Yu ChengThe Chinese University of Hong KongVerified email at cse.cuhk.edu.hk
Jie Lei 雷杰Research Scientist, Meta AIVerified email at fb.com
Caiming XiongSalesforce ResearchVerified email at salesforce.com
Richard Socheryou.comVerified email at stanford.edu
Yingbo ZhouSenior Research Director, Salesforce ResearchVerified email at salesforce.com
Lei ZhangInternational Digital Economy Academy (IDEA)Verified email at idea.edu.cn
Mike Z. SHOUNational U. of Singapore; Facebook AI; Columbia UniversityVerified email at columbia.edu
Hamid PalangiGoogle and University of WashingtonVerified email at google.com
Xinlei ChenFAIR, MetaVerified email at meta.com
Marcus RohrbachProfessor for Multimodal Reliable AI, TU Darmstadt, GermanyVerified email at tu-darmstadt.de
Yannis KalantidisNAVER LABS EuropeVerified email at naverlabs.com
Chunlin ChenNanjing UniversityVerified email at nju.edu.cn

Luowei Zhou

Research Scientist, Google Deepmind

Verified email at google.com - Homepage

Vision and Language Multimodal Language Models Video Analysis Generative Models


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023	1042	2023
Unified vision-language pre-training for image captioning and vqa L Zhou, H Palangi, L Zhang, H Hu, J Corso, J Gao Proceedings of the AAAI conference on artificial intelligence 34 (07), 13041 …, 2020	903	2020
Florence: A new foundation model for computer vision L Yuan, D Chen, YL Chen, N Codella, X Dai, J Gao, H Hu, X Huang, B Li, ... arXiv preprint arXiv:2111.11432, 2021	764	2021
Towards automatic learning of procedures from web instructional videos L Zhou, C Xu, J Corso Proceedings of the AAAI Conference on Artificial Intelligence 32 (1), 2018	748	2018
Less is more: Clipbert for video-and-language learning via sparse sampling J Lei, L Li, L Zhou, Z Gan, TL Berg, M Bansal, J Liu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021	641	2021
End-to-end dense video captioning with masked transformer L Zhou, Y Zhou, JJ Corso, R Socher, C Xiong Proceedings of the IEEE conference on computer vision and pattern …, 2018	634	2018
Regionclip: Region-based language-image pretraining Y Zhong, J Yang, P Zhang, C Li, N Codella, LH Li, L Zhou, X Dai, L Yuan, ... Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	416	2022
Grounded video description L Zhou, Y Kalantidis, X Chen, JJ Corso, M Rohrbach Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019	214	2019
Bevt: Bert pretraining of video transformers R Wang, D Chen, Z Wu, Y Chen, X Dai, M Liu, YG Jiang, L Zhou, L Yuan Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	211	2022
Omnivl: One foundation model for image-language and video-language tasks J Wang, D Chen, Z Wu, C Luo, L Zhou, Y Zhao, Y Xie, C Liu, YG Jiang, ... Advances in neural information processing systems 35, 5696-5710, 2022	119	2022
Clip-event: Connecting text and images with event structures M Li, R Xu, S Wang, L Zhou, X Lin, C Zhu, M Zeng, H Ji, SF Chang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	116	2022
Value: A multi-task benchmark for video-and-language understanding evaluation L Li, J Lei, Z Gan, L Yu, YC Chen, R Pillai, Y Cheng, L Zhou, XE Wang, ... arXiv preprint arXiv:2106.04632, 2021	103	2021
Dense video captioning Y Zhou, L Zhou, C Xiong, R Socher US Patent 10,542,270, 2020	99	2020
Language models with image descriptors are strong few-shot video-language learners Z Wang, M Li, R Xu, L Zhou, J Lei, X Lin, S Wang, Z Yang, C Zhu, ... Advances in Neural Information Processing Systems 35, 8483-8497, 2022	97	2022
Watch what you just said: Image captioning with text-conditional attention L Zhou, C Xu, P Koch, JJ Corso Proceedings of the on Thematic Workshops of ACM Multimedia 2017, 305-313, 2017	94	2017
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction L Zhou, N Louis, JJ Corso British Machine Vision Conference, 2018	85	2018
Uc2: Universal cross-lingual cross-modal vision-and-language pre-training M Zhou, L Zhou, S Wang, Y Cheng, L Li, Z Yu, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	79	2021
Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer L Zhou, P Yang, C Chen, Y Gao IEEE transactions on cybernetics 47 (5), 1238-1250, 2016	62	2016
Mist: Multi-modal iterative spatial-temporal transformer for long-form video question answering D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2023	59	2023
Image caption generation with text-conditional semantic attention L Zhou, C Xu, P Koch, JJ Corso arXiv preprint arXiv:1606.04621 2, 2016	47	2016

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors