top of page
Rogerio Schmidt Feris

Principal Scientist and Manager

MIT-IBM Watson AI Lab

IBM Research


I am a principal scientist and manager at the MIT-IBM Watson AI lab. My current work is particularly focused on deep learning methods that are label-efficient (learning with limited labels), sample-efficient (learning with less data), and computationally efficient. I am also interested in multimodal perception methods that combine vision, sound/speech, and language. 


I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.



Pre-training and Transfer from Synthetic Data


Procedural Image Programs for Representation Learning


M. Baradad, R. Chen, J. Wulff, T. Wang, R. Feris, A. Torralba, and P. Isola

NeurIPS 2022


[Paper]​​ [Project Page] [Code]


How Transferable are Video Representations Based on Synthetic Data?


Y. Kim, S. Mishra, S. Jin, R. Panda, H. Kuehne, L. Karlinsky, V. Saligrama, K. Saenko, A. Oliva, and R. Feris

NeurIPS 2022, Dataset Track


[Paper]​​ [Dataset]


Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data


S. Mishra, R. Panda, C. Phoo, C. Chen, L. Karlinsky, K. Saenko, V. Saligrama, and R. Feris

CVPR 2022


[Paper]​​ [Project Page] [Code]


SimVQA: Exploring Simulated Environments for Visual Question Answering


P. Bonilla, H. Wu, L. Wang, R. Feris, and V. Ordonez

CVPR 2022


[Paper]​​ [Code]


Dynamic Neural Networks for Efficient AI

Instead of relying on one-size-fits-all models, we are investigating dynamic neural networks that adaptively change computation depending on the input.


IA-RED^2: Interpretability-Aware Redundancy Reduction for Vision Transformers


Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva

NeurIPS 2021


[Paper] [Project Page] [Code]


Dynamic Network Quantization for Efficient Video Inference


X. Sun, R. Panda, C. Chen, A. Oliva, R. Feris, and K. Saenko

ICCV 2021


[Paper] [Project Page] [Code]


AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition


R. Panda, C. Chen, Q. Fan, X. Sun, K. Saenko, A. Oliva, and R. Feris

ICCV 2021


[Paper]​​ [Project Page] [Code]


AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition


Y. Meng, R. Panda, C. Lin, P. Sattigeri, L. Karlinsky, K. Saenko, A. Oliva, and R. Feris

ICLR 2021


[Paper]​​ [Project Page] [Code]


VA-RED^2: Video Adaptive Redundancy Reduction


B. Pan, R. Panda, C. Fosco, C. Lin, A. Andonian, Y. Meng, K. Saenko, A. Oliva, and R. Feris

ICLR 2021


[Paper]​​ [Project Page] [Code]


AR-Net: Adaptive Frame Resolution for Efficient Action Recognition


Y. Meng, C. Lin, R. Panda, P. Sattigeri, L. Karlinsky, A. Oliva, K. Saenko, and R. Feris 

ECCV 2020


[Paper]​​ [Project Page] [Code] [MIT News]


AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning 


X. Sun, R. Panda, R. Feris, and K. Saenko 

NeurIPS 2020


See also: Fully-adaptive Feature Sharing in Multi-Task Networks (CVPR 2017)

[Paper]​​ [Project Page] [Code]


SpotTune: Transfer Learning through Adaptive Fine-tuning

Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris

CVPR 2019

Top results on the Visual Decathlon challenge (2019)

[Paper] [Code]


BlockDrop: Dynamic Inference Paths in Residual Networks


Z. Wu*, T. Nagarajan*, A. Kumar, S. Rennie, L. Davis, K. Grauman, and R.  Feris (* equal contribution) 

CVPR 2018, Spotlight


[Paper]​​ [Code]

Deep Learning with Limited Labeled Data


I'm currently leading the IBM-MIT team as part of Darpa Learning with Less Labels (LwLL), together with Prof. Josh Tenenbaum



A Broader Study of Cross-Domain Few-Shot Learning


Y. Guo, N. Codella, L. Karlinsky, J. Codella, J. Smith, K. Saenko, T. Rosing, and R. Feris

ECCV 2020


See also: CVPR VL3 Workshop and the challenge associated with our benchmark

[Paper]​​ [Code and Data]

Few-shot Learning


Fine-grained Angular Contrastive Learning with Coarse Labels


G. Bukchin, E. Schwartz, K. Saenko, O. Shahar, R. Feris, R. Giryes, and L. Karlinsky

CVPR 2021, Oral




TAFSSL: Task-Adaptive Feature Sub-Space Learning for Few-shot Classification


M. Lichtenstein, P. Sattigeri, R. Feris, R. Giryes, and L. Karlinsky

ECCV 2020


[Paper]​​ [Code]


RepMet: Representative-based Metric Learning for Classification and One-shot Object Detection

L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. Bronstein

CVPR 2019


See also: Our StarNet paper (AAAI 2021)

[Paper] [Code]

Transfer Learning and Adaptation


A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A. Islam, C. Chen, R. Panda, L. Karlinsky, R. Radke, and R. Feris

ICCV 2021


[Paper] [Code]


See our SpotTune (CVPR 2019)  and our AdaShare paper (NeurIPS 2020) in the dynamic neural networks section


Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

Z. Wang, M. Yu, Y. Wei, R. Feris, J. Xiong, W. Hwu, T. Huang, and H. Shi

CVPR 2020


[Paper] [Code]


Co-regularized Alignment for Unsupervised Domain Adaptation

A. Kumar, P. Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, W. T. Freeman, and G. Wornell

NeurIPS 2018



Data Augmentation


OnlineAugment: Online Data Augmentation with Less Domain Knowledge

Z. Tang, Y. Gao, P. Sattigeri, L. Karlinsky, R. Feris, and D. Metaxas

ECCV 2020


[Paper] [Code]


LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning

A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, and A. Bronstein 

CVPR 2019, Oral


[Paper] [Code]


Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein 

NeurIPS 2018, Spotlight


[Paper] [Code]


S3Pool: Pooling with Stochastic Spatial Sampling

S. Zhai, H. Wu, A. Kumar, Y. Cheng, Y. Lu, Z. Zhang, and R. Feris 

CVPR 2017


[Paper] [Code]

Multimodal Learning (Vision, Audio, Speech, Language) and Applications


Audio-Visual Learning


Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

N. Shvetsova,  B. Chen,  A. Rouditchenko,  S.Thomas,  B. Kingsbury,  R. Feris,  D. Harwath,  J. Glass,  and H. Kuehne

CVPR 2022




Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

B. Chen, A. Rouditchenko, K. Duarte, H. Kuehne, S. Thomas, A. Boggust, R. Panda, B. Kingsbury, R. Feris, D. Harwath, J. Glass, M. Picheny, and S.F. Chang

ICCV 2021



Cascaded Multilingual Audio-Visual Learning from Videos

A. Rouditchenko, A. Boggust, D. Harwath, S.Thomas, H. Kuehne, B. Chen, R. Panda, R. Feris, B. Kingsbury, M. Picheny, and James Glass

Interspeech 2021

[Paper] [Project Page] [Code]


See our AdaMML paper in the dynamic neural networks section


Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

M. Monfort, S. Jin, A. Liu, D. Harwath, R. Feris, J. Glass, and A. Oliva

CVPR 2021


[Project Page] [Paper] [Data]


AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

A. Rouditchenko, A. Boggust, D. Harwath, D. Joshi, S. Thomas, K. Audhkhasi, R. Feris, B. Kingsbury, M. Picheny, A. Torralba, and J. Glass

Interspeech 2021


[Paper] [Project Page] [Video Demo] [Code]


Automatic Curation of Sports Highlights using Multimodal Excitement Features

M. Merler, D. Joshi, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do, J. Smith, and R. Feris 

IEEE Transactions on MultiMedia (TMM) 2019

Our system was used to produce the official highlights of the USOpen, Wimbledon, and Masters tournaments (and watched by millions of fans worldwide)


[Paper] [Blog]  [Video Demo 1] [Video Demo 2] [New York Times] [Fortune] [Newsweek] [Engadget] [NBC News] [Behind the Code


The Excitement of Sports: Automatic Highlights using Audio-Visual Cues

M. Merler, D. Joshi, K. Mac, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do , J. Smith, and R. Feris

CVPR Workshop on Sight and Sound, 2018.


[Paper] [Slides] [Video Demo 1] [Video Demo 2] [Blog] [Venturebeat] [ZDNet]


Learning to Separate Object Sounds by Watching Unlabeled Video

R. Gao, R. Feris, and K. Grauman

ECCV 2018, Oral


[Paper] [Project Page] [Code]

Vision and Language for Fashion


Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris

CVPR 2021


[Paper] [Data]


Dialog-based Interactive Image Retrieval 

X. Guo, H. Wu, Y. Cheng, S. Rennie, G. Tesauro and R. Feris 

NeurIPS 2018 


[Paper] [Code] [Video Demo]


Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network

J. Huang, R. Feris, Q. Chen, and S. Yan

ICCV 2015


[Paper] [Data

Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes

Q. Chen, J. Huang, R. Feris, L. Brown, J. Dong, and S. Yan 

CVPR 2015




Visual-Question Answering


Separating Skills and Concepts for Novel Visual Question Answering

S. Whitehead, H. Wu, H. Ji, R. Feris, and K. Saenko

CVPR 2021




Learning from Lexical Perturbations for Consistent Visual Question Answering

S. Whitehead, H. Wu, Y. Fung, H. Ji, R. Feris, and K. Saenko

Arxiv 2020



Egocentric Video + Geo-location + Weather

Other Projects

Model Compression and Acceleration


Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

C. Chen, Q. Fan, N. Mallinar, T. Sercu, and R. Feris

ICLR 2019


[Paper] [Code]


Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi, and R. Feris 

CVPR 2017, Spotlight




An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

Y. Cheng, F. Yu, R. Feris, S. Kumar, A. Choudhary, and S. F. Chang 

ICCV 2015



More on Video: Action Recognition and Tracking


Semi-Supervised Action Recognition with Temporal Contrastive Learning

A. Singh, O. Chakraborty, A. Varshney, R. Panda, R. Feris, K. Saenko, and A. Das

CVPR 2021




See our efficient action recognition papers - Adafuse (ICLR 2021), VA-RED^2 (ICLR 2021), and AR-Net (ECCV 2020) in the dynamic neural networks section


Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

C. Chen, R. Panda, K. Ramakrishnan, R. Feris, J. Cohn, A. Oliva, and Q. Fan

CVPR 2021


[Paper] [Code


We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

A. Andonian, C. Fosco, M. Monfort, A. Lee, R. Feris, C. Vondrick, and A. Oliva 

ECCV 2020


[Paper] [Code] [Project Page] [MIT News]


Video Instance Segmentation Tracking

C. Lin, Y. Hung, R. Feris, and L. He 

CVPR 2020




Learning Motion in Feature Space: Locally- Consistent Deformable Convolution Networks for Fine Grained Action Detection

M. Khoi-Nguyen, D. Joshi, R. Yeh, J. Xiong, R. Feris, and M. Do

ICCV 2019, Oral


[Paper] [Code] [Project Page]

Object Detection and Matching


Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

B. Cheng, Y. Wei, H. She, R. Feris, J. Xiong, and T. Huang 

ECCV 2018

DCR achieved state-of-the-art results on Pascal VOC and MS-COCO


[Paper] [Code]


A Unified Multi Scale Deep Convolutional Neural Network for Fast Object Detection

Z. Cai, Q. Fan, R. Feris, and N. Vasconcelos

ECCV 2016


MS-CNN achieved state-of-the-art results on the popular KITTI dataset

[Paper] [Code] [Demo] [KITTI results] [Project Page]


ICCV 2015 Tutorial on Tools for Efficient Object Detection (with Piotr Dollar, Xiaoyu Wang, Kaiming He, Ross Girshick, Rodrigo Benenson, and Jan Hosang).


Efficient Maximum Appearance Search for Large-Scale Object Detection

Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao, Z. Huang, and S. Yan

CVPR 2013



Shape Classification Through Structured Learning of Matching Measures

L. Chen, J. McAuley, R. Feris, T. Caetano, and M. Turk

CVPR 2009

[Paper] [Code

Visual Attributes


Visual Attributes

R. Feris, C. Lampert, and D. Parikh

Advances in Computer Vision and Pattern Recognition, Springer, 2016

[Book Link]


Check out the Vision and Language for Fashion section for more papers on Visual Attributes: Fashion IQ (CVPR 2021), DARN (ICCV 2015), and DDAN (CVPR 2015)


Designing Category-level Attributes for Discriminative Visual Recognition

F. Yu, L. Cao, R. Feris, J. Smith, and S. F. Chang

CVPR 2013



I co-founded and organized the first, second, and third Workshop on Parts and Attributes


Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos

R. Feris, B. Siddiquie, J. Petterson, Y. Zhai, A. Datta, L. Brown, and S. Pankanti

IEEE Transactions on Multimedia, 2012

See also: Feris et al, Attribute-based Vehicle Search in Crowded Surveillance Videos, ICMR 2011

[Paper] [Video Demos


Image Ranking and Retrieval Based on Multi-Attribute Queries

B. Siddiquie, R. Feris, and L. Davis

CVPR 2011, Oral



Attribute-based People Search

R. S. Feris, R. Bobbit, L. Brown and S. Pankanti

ICMR 2014

See also:

[Paper] [Video Demo

Computational Photography


A Projector-Camera Setup for Geometry-Invariant Frequency Demultiplexing

D. Vaquero, R. Raskar, R. Feris, and M. Turk

CVPR 2009



Characterizing the Shadow Space of Camera-Light Pairs

D. Vaquero, R. Feris, M. Turk, and R. Raskar

CVPR 2008



Discontinuity Preserving Stereo with Small Baseline Multi-Flash Illumination

R. Feris, L. Chen, M. Turk, R. Raskar, and K. Tan

ICCV 2005, Oral 

See also: Feris et al, TPAMI 2007

[Paper] [Project Page] [Code] [Data]


Automatic Human Facial Illustrations with Variable Illumination

R. Feris and A. Olwal

SIGGRAPH Emerging Technologies, 2005 (Interactive Fogscreen)

[Project Page] [Code


Specular Reflection Reduction with Multi-Flash Imaging

R. Feris, R. Raskar, K. Tan, and M. Turk

SIGGRAPH 2004 Poster



Exploiting Depth Discontinuities for Vision-based Fingerspelling Recognition

R. Feris, M. Turk, R. Raskar, K. Tan, and G. Ohashi

CVPR RTV4HCI Workshop 2004



Shape Enhanced Surgical Visualizations and Medical Illustrations with Multi-flash Imaging

K. Tan, J. Kobler, R. Feris, P. Dietz, and R. Raskar



Human Sensing


A Recurrent Encoder-Decoder Network for Sequential Face Alignment

X. Peng, R. Feris, X. Wang, and D. Metaxas

ECCV 2016

[Paper] [Code] [Project Page] [Video Demo]


Fast Face Detector Training Using Tailored Views

K. Scherbaum, R. Feris, J. Petterson, V. Blanz, and H. Seidel

ICCV 2013



Manifold-based Analysis of Facial Expression

Y Chang, C Hu, R Feris, and M. Turk

Image and Vision Computing, 2006

[Paper] [Video Demo


Hierarchical Wavelet Networks for Facial Feature Localization

R. Feris, J. Gemmell, K. Toyama, and V. Krueger

Face and Gesture Recognition 2002

Developed as part of the GazeMaster project for videoconferencing. I did this work during my internship at Microsoft Research in 2001.

[Paper] [IFA Head Pose Tracking Demo]


Efficient Real-Time Face Tracking in Wavelet Subspace

R. Feris, V. Krueger and R. M. Cesar Jr.

ICCV RATFG-RTS Workshop, 2001

[Paper] [Video Demo 1] [Video Demo 2]

Recent Talks

  • ICML 2020 LatinX in AI Workshop. "Dynamic Neural Networks for Efficient Image and Video Classification" [SlidesLive Talk] [pdf]

  • CVPR 2020 DIRA Workshop. "Visual Learning Beyond Natural Images" [Talk] [pdf]

  • NeurIPS 2019 EMC^2 Workshop. "Dynamic Neural Networks for Efficient Inference" [SlidesLive Talk] [pdf]

  • CVPR 2019 FFSS-USAD Workshop. "Is it All Relative? Interactive Fashion Search based on Relative Natural Language Feedback” [pdf]

  • CVPR 2019 EMC^2 Workshop. “Speeding Up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures”[pdf]

  • CVPR 2019 Workshop on Learning from Imperfect Data. "Learning More from Less: Weak Supervision and Beyond" [pdf]

Media Press


The postings on this site are my own and don't necessarily represent IBM's positions.

bottom of page