Jie Lei

Jie Lei

Old Town, San Diego, May 2019 (Courtesy of Qin)

I am a research scientist at Meta AI, Seattle. My primary research interests are vision-and-language and video modeling. I received my PhD in Computer Science from UNC Chapel Hill in 2022, advised by Tamara L. Berg and Mohit Bansal. I received my bachelor's degree in Computer Science from Yingcai Honors College, University of Electronic Science and Technology of China (UESTC) in 2017. I am a receipt of the Adobe Research Fellowship and the CVPR 2021 Best Student Paper Honorable Mention award.

google scholar github twitter cv

Email: jielei [at] meta.com

Our team at Meta is hiring 2023 research scientist interns in image/video/text generation, vision-and-language, video/image understanding, etc. Check job descriptions [1] Computer Vision, [2] Natural Language Processing.


Publications & Preprints

VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius
CVPR 2023 [PDF] [Code] Star
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius
CVPR 2023 [PDF] [Code] Star
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei, Tamara L. Berg, Mohit Bansal
arXiv 2022 [PDF] [Data & Code] Star
PERCEIVER-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang*, Jaemin Cho*, Jie Lei, Mohit Bansal
WACV 2023 [PDF] [Code] Star
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
NeurIPS 2022 [PDF] [Code] Star
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
ECCV 2022 Oral [PDF] [Project Page] [Code] Star
Resin-11: Schema-guided event prediction for 11 newsworthy scenarios
Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji
NAACL 2022 System Demo [PDF]
LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval
Jie Lei, Xinlei Chen, Ning Zhang, Mengjiao Wang, Mohit Bansal, Tamara L. Berg, Licheng Yu
arXiv 2022 [PDF]
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan*, Jie Lei*, Thomas Wolf, Mohit Bansal
CVPRW 2022 [PDF] [Code] Star
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei, Tamara L. Berg, Mohit Bansal
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li*, Jie Lei*, Zhe Gan, Licheng Yu, Yen-Chun Chen, Rohit Pillai, Yu Cheng, Luowei Zhou, Xin Eric Wang, William Yang Wang, Tamara L. Berg, Mohit Bansal, Jingjing Liu, Lijuan Wang, Zicheng Liu
NeurIPS 2021 - Datasets and Benchmarks Track [PDF] [Code] [Leaderboard & Challenge]
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li, Jie Lei, Zhe Gan, Jingjing Liu
ICCV 2021 Oral (top 3%) [PDF] [Dataset]
mTVR: Multilingual Moment Retrieval in Videos
Jie Lei, Tamara L. Berg, Mohit Bansal
ACL 2021 [PDF] [Code]
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho, Jie Lei, Hao Tan, Mohit Bansal
ICML 2021 [PDF] [Code] Star
Improved Pre-Training from Noisy Instructional Videos via Dense Captions and Entropy Minimization
Zineng Tang*, Jie Lei*, Mohit Bansal
NAACL 2021 [PDF] [Code] Star
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei*, Linjie Li*, Luowei Zhou, Zhe Gan, Tamara L. Berg, Mohit Bansal, Jingjing Liu
CVPR 2021 Best Student Paper Honorable Mention (top 0.1%) Oral [PDF] [Code] Star
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
EMNLP 2020 [PDF] [VLEP Dataset]
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei, Liwei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal
TVQA: Localized, Compositional Video Question Answering
Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg
EMNLP 2018 Oral [PDF] [Slides] [Dataset] [Code] Star
Weakly Supervised Image Classification with Coarse and Fine Labels
Jie Lei, Zhenyu Guo and Yang Wang
CRV 2017 [PDF] [Code] Star


AnimeGAN: Create Anime Face using Generative Adversarial Networks,
Jie Lei
A simple GAN model that could automatically generate anime girl faces.