VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text H Akbari, L Yuan, R Qian, WH Chuang, SF Chang, Y Cui, B Gong Advances in Neural Information Processing Systems, 2021, 2021 | 525 | 2021 |
Pali: A jointly-scaled multilingual language-image model X Chen, X Wang, S Changpinyo, AJ Piergiovanni, P Padlewski, D Salz, ... arXiv preprint arXiv:2209.06794, 2022 | 402 | 2022 |
Towards reconstructing intelligible speech from the human auditory cortex H Akbari, B Khalighinejad, JL Herrero, AD Mehta, N Mesgarani Scientific reports 9 (1), 874, 2019 | 230 | 2019 |
Lip2audspec: Speech reconstruction from silent lip movements video H Akbari, H Arora, L Cao, N Mesgarani 2018 IEEE international conference on acoustics, speech and signal …, 2018 | 107 | 2018 |
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding H Akbari, S Karaman, S Bhargava, B Chen, C Vondrick, SF Chang Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019 | 83 | 2019 |
Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models H Akbari, M Keshishian, B Khalighinejad, JL Herrero, AD Mehta, ... Elife 9, e53445, 2020 | 56 | 2020 |
Videopoet: A large language model for zero-shot video generation D Kondratyuk, L Yu, X Gu, J Lezama, J Huang, R Hornung, H Adam, ... arXiv preprint arXiv:2312.14125, 2023 | 31 | 2023 |
Fetal ECG extraction using πTucker decomposition H Akbari, MB Shamsollahi, R Phlypo 2015 International Conference on Systems, Signals and Image Processing …, 2015 | 20 | 2015 |
GAIA-A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System. T Zhang, A Subburathinam, G Shi, L Huang, D Lu, X Pan, M Li, B Zhang, ... TAC, 2018 | 14 | 2018 |
GAIA at SM-KBP 2019-A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System M Li, Y Lin, A Subburathinam, S Whitehead, X Pan, D Lu, Q Wang, ... | 6 | 2019 |
A robust FCM algorithm for image segmentation based on spatial information and Total Variation H Akbari, HM Kalkhoran, E Fatemizadeh 2015 9th Iranian Conference on Machine Vision and Image Processing (MVIP …, 2015 | 6 | 2015 |
Alternating gradient descent and mixture-of-experts for integrated multimodal perception H Akbari, D Kondratyuk, Y Cui, R Hornung, H Wang, H Adam Advances in Neural Information Processing Systems 36, 2024 | 4 | 2024 |
Scaling multimodal pre-training via cross-modality gradient harmonization J Wu, Y Liang, H Akbari, Z Wang, C Yu Advances in Neural Information Processing Systems 35, 36161-36173, 2022 | 4 | 2022 |
Face-speech bridging by cycle video/audio reconstruction HV Joze, H Akbari US Patent 10,931,976, 2021 | 4 | 2021 |
Neuro-symbolic representations for video captioning: A case for leveraging inductive biases for vision and language H Akbari, H Palangi, J Yang, S Rao, A Celikyilmaz, R Fernandez, ... arXiv preprint arXiv:2011.09530, 2020 | 3 | 2020 |
Modality Bridging and Unified Multimodal Understanding H Akbari Columbia University, 2022 | 1 | 2022 |
Time marking chapters in media items at a platform using machine-learning C Gu, WH Chuang, MH Tsai, J Yang, J Zhang, H Zhou, H Akbari US Patent App. 18/244,625, 2023 | | 2023 |
Time marking chapters in media items at a platform using machine-learning C Gu, WH Chuang, MH Tsai, J Yang, J Zhang, H Zhou, H Akbari US Patent 11,758,233, 2023 | | 2023 |