Rogerio Schmidt Feris
Principal Scientist and Manager
MIT-IBM Watson AI Lab
IBM Research
I am a principal scientist and manager at the MIT-IBM Watson AI lab. My current work is particularly focused on deep learning methods that are label-efficient (learning with limited labels), sample-efficient (learning with less data), and computationally efficient. I am also interested in multimodal perception methods that combine vision, sound/speech, and language.
I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.
Six papers accepted at CVPR 2022
Two papers accepted at NeurIPS 2021, five papers at ICCV 2021, six papers at CVPR 2021, two papers at ICLR 2021, and two papers at AAAI 2021
I'm giving three invited talks at the following CVPR 2021 workshops: MULA, L2ID, and LatinX
I'm an Area Chair of ICLR 2021, CVPR 2021, ICML 2021, and NeurIPS 2021
Five papers accepted at ECCV 2020, two papers at CVPR 2020, and one paper at NeurIPS 2020
I'm giving invited talks at the ICML 2020 LatinX in AI Workshop, CVPR 2020 DIRA Workshop, and the What's Next in AI event
Check out our CVPR 2020 VL3 Workshop and ICCV 2019 Tutorial on Visual Learning with Limited Labeled Data
I'm an Area Chair of NeurIPS 2020, ECCV 2020, and CVPR 2020
I'm giving three invited talks at CVPR 2019 Workshops ( EMC^2, FFSS-USAD, and Weakly SL).
Check out my NeurIPS 2019 invited talk (EMC^2 Workshop) on Dynamic Neural Networks
Pre-training and Transfer from Synthetic Data
Procedural Image Programs for Representation Learning
M. Baradad, R. Chen, J. Wulff, T. Wang, R. Feris, A. Torralba, and P. Isola
NeurIPS 2022
[Paper] [Project Page] [Code]
How Transferable are Video Representations Based on Synthetic Data?
Y. Kim, S. Mishra, S. Jin, R. Panda, H. Kuehne, L. Karlinsky, V. Saligrama, K. Saenko, A. Oliva, and R. Feris
NeurIPS 2022, Dataset Track
Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data
S. Mishra, R. Panda, C. Phoo, C. Chen, L. Karlinsky, K. Saenko, V. Saligrama, and R. Feris
CVPR 2022
[Paper] [Project Page] [Code]
SimVQA: Exploring Simulated Environments for Visual Question Answering
P. Bonilla, H. Wu, L. Wang, R. Feris, and V. Ordonez
CVPR 2022
[Paper] [Code]
Dynamic Neural Networks for Efficient AI
Instead of relying on one-size-fits-all models, we are investigating dynamic neural networks that adaptively change computation depending on the input.
IA-RED^2: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva
NeurIPS 2021
[Paper] [Project Page] [Code]
Dynamic Network Quantization for Efficient Video Inference
X. Sun, R. Panda, C. Chen, A. Oliva, R. Feris, and K. Saenko
ICCV 2021
[Paper] [Project Page] [Code]
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
R. Panda, C. Chen, Q. Fan, X. Sun, K. Saenko, A. Oliva, and R. Feris
ICCV 2021
[Paper] [Project Page] [Code]
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
Y. Meng, R. Panda, C. Lin, P. Sattigeri, L. Karlinsky, K. Saenko, A. Oliva, and R. Feris
ICLR 2021
[Paper] [Project Page] [Code]
VA-RED^2: Video Adaptive Redundancy Reduction
B. Pan, R. Panda, C. Fosco, C. Lin, A. Andonian, Y. Meng, K. Saenko, A. Oliva, and R. Feris
ICLR 2021
[Paper] [Project Page] [Code]
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
Y. Meng, C. Lin, R. Panda, P. Sattigeri, L. Karlinsky, A. Oliva, K. Saenko, and R. Feris
ECCV 2020
[Paper] [Project Page] [Code] [MIT News]
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
X. Sun, R. Panda, R. Feris, and K. Saenko
NeurIPS 2020
See also: Fully-adaptive Feature Sharing in Multi-Task Networks (CVPR 2017)
[Paper] [Project Page] [Code]
SpotTune: Transfer Learning through Adaptive Fine-tuning
Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris
CVPR 2019
Top results on the Visual Decathlon challenge (2019)
BlockDrop: Dynamic Inference Paths in Residual Networks
Z. Wu*, T. Nagarajan*, A. Kumar, S. Rennie, L. Davis, K. Grauman, and R. Feris (* equal contribution)
CVPR 2018, Spotlight
Deep Learning with Limited Labeled Data
I'm currently leading the IBM-MIT team as part of Darpa Learning with Less Labels (LwLL), together with Prof. Josh Tenenbaum
A Broader Study of Cross-Domain Few-Shot Learning
Y. Guo, N. Codella, L. Karlinsky, J. Codella, J. Smith, K. Saenko, T. Rosing, and R. Feris
ECCV 2020
See also: CVPR VL3 Workshop and the challenge associated with our benchmark
[Paper] [Code and Data]
Few-shot Learning
Fine-grained Angular Contrastive Learning with Coarse Labels
G. Bukchin, E. Schwartz, K. Saenko, O. Shahar, R. Feris, R. Giryes, and L. Karlinsky
CVPR 2021, Oral
TAFSSL: Task-Adaptive Feature Sub-Space Learning for Few-shot Classification
M. Lichtenstein, P. Sattigeri, R. Feris, R. Giryes, and L. Karlinsky
ECCV 2020
RepMet: Representative-based Metric Learning for Classification and One-shot Object Detection
L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. Bronstein
CVPR 2019
See also: Our StarNet paper (AAAI 2021)
Transfer Learning and Adaptation
A Broad Study on the Transferability of Visual Representations with Contrastive Learning
A. Islam, C. Chen, R. Panda, L. Karlinsky, R. Radke, and R. Feris
ICCV 2021
Z. Wang, M. Yu, Y. Wei, R. Feris, J. Xiong, W. Hwu, T. Huang, and H. Shi
CVPR 2020
Co-regularized Alignment for Unsupervised Domain Adaptation
A. Kumar, P. Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, W. T. Freeman, and G. Wornell
NeurIPS 2018
Data Augmentation
OnlineAugment: Online Data Augmentation with Less Domain Knowledge
Z. Tang, Y. Gao, P. Sattigeri, L. Karlinsky, R. Feris, and D. Metaxas
ECCV 2020
LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning
A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, and A. Bronstein
CVPR 2019, Oral
Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition
E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein
NeurIPS 2018, Spotlight
X. Peng, Z. Tang, F. Yang, R. Feris, and D. Metaxas
CVPR 2018
S3Pool: Pooling with Stochastic Spatial Sampling
S. Zhai, H. Wu, A. Kumar, Y. Cheng, Y. Lu, Z. Zhang, and R. Feris
CVPR 2017
Multimodal Learning (Vision, Audio, Speech, Language) and Applications
Audio-Visual Learning
Everything at Once – Multi-modal Fusion Transformer for Video Retrieval
N. Shvetsova, B. Chen, A. Rouditchenko, S.Thomas, B. Kingsbury, R. Feris, D. Harwath, J. Glass, and H. Kuehne
CVPR 2022
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
B. Chen, A. Rouditchenko, K. Duarte, H. Kuehne, S. Thomas, A. Boggust, R. Panda, B. Kingsbury, R. Feris, D. Harwath, J. Glass, M. Picheny, and S.F. Chang
ICCV 2021
Cascaded Multilingual Audio-Visual Learning from Videos
A. Rouditchenko, A. Boggust, D. Harwath, S.Thomas, H. Kuehne, B. Chen, R. Panda, R. Feris, B. Kingsbury, M. Picheny, and James Glass
Interspeech 2021
[Paper] [Project Page] [Code]
See our AdaMML paper in the dynamic neural networks section
Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
M. Monfort, S. Jin, A. Liu, D. Harwath, R. Feris, J. Glass, and A. Oliva
CVPR 2021
[Project Page] [Paper] [Data]
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
A. Rouditchenko, A. Boggust, D. Harwath, D. Joshi, S. Thomas, K. Audhkhasi, R. Feris, B. Kingsbury, M. Picheny, A. Torralba, and J. Glass
Interspeech 2021
[Paper] [Project Page] [Video Demo] [Code]
Automatic Curation of Sports Highlights using Multimodal Excitement Features
M. Merler, D. Joshi, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do, J. Smith, and R. Feris
IEEE Transactions on MultiMedia (TMM) 2019
Our system was used to produce the official highlights of the USOpen, Wimbledon, and Masters tournaments (and watched by millions of fans worldwide)
[Paper] [Blog] [Video Demo 1] [Video Demo 2] [New York Times] [Fortune] [Newsweek] [Engadget] [NBC News] [Behind the Code]
The Excitement of Sports: Automatic Highlights using Audio-Visual Cues
M. Merler, D. Joshi, K. Mac, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do , J. Smith, and R. Feris
CVPR Workshop on Sight and Sound, 2018.
[Paper] [Slides] [Video Demo 1] [Video Demo 2] [Blog] [Venturebeat] [ZDNet]
Learning to Separate Object Sounds by Watching Unlabeled Video
R. Gao, R. Feris, and K. Grauman
ECCV 2018, Oral
[Paper] [Project Page] [Code]
Vision and Language for Fashion
Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback
H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris
CVPR 2021
I co-founded the Workshop on Computer Vision for Fashion, Art, and Design
Dialog-based Interactive Image Retrieval
X. Guo, H. Wu, Y. Cheng, S. Rennie, G. Tesauro and R. Feris
NeurIPS 2018
Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network
J. Huang, R. Feris, Q. Chen, and S. Yan
ICCV 2015
Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes
Q. Chen, J. Huang, R. Feris, L. Brown, J. Dong, and S. Yan
CVPR 2015
Visual-Question Answering
Separating Skills and Concepts for Novel Visual Question Answering
S. Whitehead, H. Wu, H. Ji, R. Feris, and K. Saenko
CVPR 2021
Learning from Lexical Perturbations for Consistent Visual Question Answering
S. Whitehead, H. Wu, Y. Fung, H. Ji, R. Feris, and K. Saenko
Arxiv 2020
Egocentric Video + Geo-location + Weather
Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data
J. Wang, Y. Cheng, and R. Feris
CVPR 2016, Oral
Other Projects
Model Compression and Acceleration
Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
C. Chen, Q. Fan, N. Mallinar, T. Sercu, and R. Feris
ICLR 2019
An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections
Y. Cheng, F. Yu, R. Feris, S. Kumar, A. Choudhary, and S. F. Chang
ICCV 2015
More on Video: Action Recognition and Tracking
Semi-Supervised Action Recognition with Temporal Contrastive Learning
A. Singh, O. Chakraborty, A. Varshney, R. Panda, R. Feris, K. Saenko, and A. Das
CVPR 2021
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
C. Chen, R. Panda, K. Ramakrishnan, R. Feris, J. Cohn, A. Oliva, and Q. Fan
CVPR 2021
We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos
A. Andonian, C. Fosco, M. Monfort, A. Lee, R. Feris, C. Vondrick, and A. Oliva
ECCV 2020
[Paper] [Code] [Project Page] [MIT News]
M. Khoi-Nguyen, D. Joshi, R. Yeh, J. Xiong, R. Feris, and M. Do
ICCV 2019, Oral
[Paper] [Code] [Project Page]
Object Detection and Matching
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
B. Cheng, Y. Wei, H. She, R. Feris, J. Xiong, and T. Huang
ECCV 2018
DCR achieved state-of-the-art results on Pascal VOC and MS-COCO
A Unified Multi Scale Deep Convolutional Neural Network for Fast Object Detection
Z. Cai, Q. Fan, R. Feris, and N. Vasconcelos
ECCV 2016
MS-CNN achieved state-of-the-art results on the popular KITTI dataset
[Paper] [Code] [Demo] [KITTI results] [Project Page]
ICCV 2015 Tutorial on Tools for Efficient Object Detection (with Piotr Dollar, Xiaoyu Wang, Kaiming He, Ross Girshick, Rodrigo Benenson, and Jan Hosang).
Efficient Maximum Appearance Search for Large-Scale Object Detection
Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao, Z. Huang, and S. Yan
CVPR 2013
Shape Classification Through Structured Learning of Matching Measures
L. Chen, J. McAuley, R. Feris, T. Caetano, and M. Turk
CVPR 2009
Visual Attributes
R. Feris, C. Lampert, and D. Parikh
Advances in Computer Vision and Pattern Recognition, Springer, 2016
Check out the Vision and Language for Fashion section for more papers on Visual Attributes: Fashion IQ (CVPR 2021), DARN (ICCV 2015), and DDAN (CVPR 2015)
Designing Category-level Attributes for Discriminative Visual Recognition
F. Yu, L. Cao, R. Feris, J. Smith, and S. F. Chang
CVPR 2013
I co-founded and organized the first, second, and third Workshop on Parts and Attributes
Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos
R. Feris, B. Siddiquie, J. Petterson, Y. Zhai, A. Datta, L. Brown, and S. Pankanti
IEEE Transactions on Multimedia, 2012
See also: Feris et al, Attribute-based Vehicle Search in Crowded Surveillance Videos, ICMR 2011
[Paper] [Video Demos]
Image Ranking and Retrieval Based on Multi-Attribute Queries
B. Siddiquie, R. Feris, and L. Davis
CVPR 2011, Oral
R. S. Feris, R. Bobbit, L. Brown and S. Pankanti
ICMR 2014
See also:
Walk and Learn (CVPR 2016 Oral)
[Paper] [Video Demo]
Computational Photography
A Projector-Camera Setup for Geometry-Invariant Frequency Demultiplexing
D. Vaquero, R. Raskar, R. Feris, and M. Turk
CVPR 2009
Characterizing the Shadow Space of Camera-Light Pairs
D. Vaquero, R. Feris, M. Turk, and R. Raskar
CVPR 2008
Discontinuity Preserving Stereo with Small Baseline Multi-Flash Illumination
R. Feris, L. Chen, M. Turk, R. Raskar, and K. Tan
ICCV 2005, Oral
See also: Feris et al, TPAMI 2007
[Paper] [Project Page] [Code] [Data]
Automatic Human Facial Illustrations with Variable Illumination
R. Feris and A. Olwal
SIGGRAPH Emerging Technologies, 2005 (Interactive Fogscreen)
[Project Page] [Code]
Non-photorealistic Camera: Depth Edge Detection and Stylized Rendering using Multi-Flash Imaging
R. Raskar, K. Tan, R. Feris, J. Yu, and M. Turk
[Paper] [Code] [Project Page] [Video Demo]
Specular Reflection Reduction with Multi-Flash Imaging
R. Feris, R. Raskar, K. Tan, and M. Turk
SIGGRAPH 2004 Poster
Exploiting Depth Discontinuities for Vision-based Fingerspelling Recognition
R. Feris, M. Turk, R. Raskar, K. Tan, and G. Ohashi
CVPR RTV4HCI Workshop 2004
Shape Enhanced Surgical Visualizations and Medical Illustrations with Multi-flash Imaging
K. Tan, J. Kobler, R. Feris, P. Dietz, and R. Raskar
Human Sensing
A Recurrent Encoder-Decoder Network for Sequential Face Alignment
X. Peng, R. Feris, X. Wang, and D. Metaxas
ECCV 2016
[Paper] [Code] [Project Page] [Video Demo]
Fast Face Detector Training Using Tailored Views
K. Scherbaum, R. Feris, J. Petterson, V. Blanz, and H. Seidel
ICCV 2013
Manifold-based Analysis of Facial Expression
Y Chang, C Hu, R Feris, and M. Turk
Image and Vision Computing, 2006
[Paper] [Video Demo]
Hierarchical Wavelet Networks for Facial Feature Localization
R. Feris, J. Gemmell, K. Toyama, and V. Krueger
Face and Gesture Recognition 2002
Developed as part of the GazeMaster project for videoconferencing. I did this work during my internship at Microsoft Research in 2001.
Efficient Real-Time Face Tracking in Wavelet Subspace
R. Feris, V. Krueger and R. M. Cesar Jr.
ICCV RATFG-RTS Workshop, 2001
[Paper] [Video Demo 1] [Video Demo 2]
Recent Talks
ICML 2020 LatinX in AI Workshop. "Dynamic Neural Networks for Efficient Image and Video Classification" [SlidesLive Talk] [pdf]
NeurIPS 2019 EMC^2 Workshop. "Dynamic Neural Networks for Efficient Inference" [SlidesLive Talk] [pdf]
CVPR 2019 FFSS-USAD Workshop. "Is it All Relative? Interactive Fashion Search based on Relative Natural Language Feedback” [pdf]
CVPR 2019 EMC^2 Workshop. “Speeding Up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures”[pdf]
CVPR 2019 Workshop on Learning from Imperfect Data. "Learning More from Less: Weak Supervision and Beyond" [pdf]
Media Press
A simpler path to better computer vision. MIT News, 2022
A safer, lower-cost alternative to real data for pretraining computer vision models. IBM Research blog, 2022.
Hallucinating to Better Text Translation. Communications of the ACM / MIT News, 2022.
IBM’s StarNet brings explainable AI to image classification. VentureBeat, 2020.
Shrinking deep learning’s carbon footprint. MIT News, 2020.
IBM’s AI creates new labeled image sets using semantic content. VentureBeat, 2019.
IBM researchers develop a pair of low-power, high-performance computer vision systems. VentureBeat, 2018.
Coffee delivery drone patented by IBM. BBC News, 2018.
Enjoy Those U.S. Open Highlights. A Computer Picked Them for You. New York Times, 2017.
How an IBM Computer Picks U.S. Open Highlights. Fortune, 2017.
IBM’s Watson Serves Up This Year’s U.S. Open Highlights. NBC News, 2017.
IBM's Watson is creating US Open tennis highlight videos. Engadget, 2017.
IBM uses AI to serve up Wimbledon highlights. CNet, 2017.
IBM To Provide Wimbledon Highlights Using Artificial Intelligence. SportTechie, 2017.
IBM Watson is creating highlight reels at the Masters. ZDNet, 2017.
Fighting terrorism in New York City. CBS 60 Minutes, 2011. (starting at minute 7:20)
ABC7 puts video analytics to the test. ABC News, 2010.
Cameras help confirm Scott suicide ruling. ABC News, 2009.
Intelligent Iris. ABC News, 2008.
The Nonphotorealistic camera. SlashDot, 2004.
The postings on this site are my own and don't necessarily represent IBM's positions.