Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 488 | 2023 |
Scaling vision transformers to 22 billion parameters M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ... International Conference on Machine Learning, 7480-7512, 2023 | 262 | 2023 |
Pali-x: On scaling up a multilingual vision and language model X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ... arXiv preprint arXiv:2305.18565, 2023 | 83 | 2023 |
Object scene representation transformer MSM Sajjadi, D Duckworth, A Mahendran, S Van Steenkiste, F Pavetic, ... Advances in Neural Information Processing Systems 35, 9512-9524, 2022 | 74 | 2022 |
Flexivit: One model for all patch sizes L Beyer, P Izmailov, A Kolesnikov, M Caron, S Kornblith, X Zhai, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 54 | 2023 |
Pali-3 vision language models: Smaller, faster, stronger X Chen, X Wang, L Beyer, A Kolesnikov, J Wu, P Voigtlaender, B Mustafa, ... arXiv preprint arXiv:2310.09199, 2023 | 23 | 2023 |
The auto arborist dataset: a large-scale benchmark for multiview urban forest monitoring under domain shift S Beery, G Wu, T Edwards, F Pavetic, B Majewski, S Mukherjee, S Chan, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 22 | 2022 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 11 | 2024 |
Methods, systems, and media for detecting two-dimensional videos placed on a sphere in abusive spherical video content by tiling the sphere F Pavetic, M Konrad, R Vorushin US Patent 10,509,965, 2019 | 10 | 2019 |
$ LCSk $++: Practical similarity metric for long strings F Pavetić, G Žužić, M Šikić arXiv preprint arXiv:1407.2407, 2014 | 8 | 2014 |
Methods, systems, and media for detecting abusive stereoscopic videos by generating fingerprints for multiple portions of a video frame V Zamaraiev, F Pavetic US Patent 9,872,056, 2018 | 6 | 2018 |
Fast and simple algorithms for computing both and F Pavetić, I Katanić, G Matula, G Žužić, M Šikić arXiv preprint arXiv:1705.07279, 2017 | 5 | 2017 |
Detecting multiple parts of a screen to fingerprint to detect abusive uploading videos F Pavetic, MR Konrad, H Pasula US Patent 10,614,539, 2020 | 4 | 2020 |
A study of autoregressive decoders for multi-tasking in computer vision L Beyer, B Wan, G Madan, F Pavetic, A Steiner, A Kolesnikov, AS Pinto, ... arXiv preprint arXiv:2303.17376, 2023 | 3 | 2023 |
Detecting multiple parts of a screen to fingerprint to detect abusive uploading videos F Pavetic, MR Konrad, H Pasula US Patent 9,972,060, 2018 | 3 | 2018 |
Methods, systems, and media for detecting abusive stereoscopic videos by generating fingerprints for multiple portions of a video frame V Zamaraiev, F Pavetic US Patent 10,499,097, 2019 | 1 | 2019 |
Using machine learning to detect which part of the screen includes embedded frames of an uploaded video F Pavetic, KHT Leung, D Tochilkin US Patent App. 18/520,532, 2024 | | 2024 |
LocCa: Visual Pretraining with Location-aware Captioners B Wan, M Tschannen, Y Xian, F Pavetic, I Alabdulmohsin, X Wang, ... arXiv preprint arXiv:2403.19596, 2024 | | 2024 |
Scalable and Cost-Efficient Information Retrieval Architecture for Massive Datasets F Pavetic, D Simcha, AT Voicu, F Chern, PW Sun, R Guo, HM Pasula, ... US Patent App. 17/886,860, 2024 | | 2024 |
Using machine learning to detect which part of the screen includes embedded frames of an uploaded video F Pavetic, KHT Leung, D Tochilkin US Patent 11,829,854, 2023 | | 2023 |