BackupHome | Rogerio Feris

Rogerio Schmidt Feris

Principal Scientist and Manager

MIT-IBM Watson AI Lab

IBM Research

email: rsferis-at-us.ibm.com

I am a principal scientist and manager at the MIT-IBM Watson AI lab. My current work is particularly focused on deep learning methods that are label-efficient (learning with limited labels), sample-efficient (learning with less data), and computationally efficient. I am also interested in multimodal perception methods that combine vision, sound/speech, and language.

I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.

News

Six papers accepted at CVPR 2022
Two papers accepted at NeurIPS 2021, five papers at ICCV 2021, six papers at CVPR 2021, two papers at ICLR 2021, and two papers at AAAI 2021
I'm giving three invited talks at the following CVPR 2021 workshops: MULA, L2ID, and LatinX
I'm an Area Chair of ICLR 2021, CVPR 2021, ICML 2021, and NeurIPS 2021
Five papers accepted at ECCV 2020, two papers at CVPR 2020, and one paper at NeurIPS 2020
I'm giving invited talks at the ICML 2020 LatinX in AI Workshop, CVPR 2020 DIRA Workshop, and the What's Next in AI event
Check out our CVPR 2020 VL3 Workshop and ICCV 2019 Tutorial on Visual Learning with Limited Labeled Data
I'm an Area Chair of NeurIPS 2020, ECCV 2020, and CVPR 2020
I'm giving three invited talks at CVPR 2019 Workshops ( EMC^2, FFSS-USAD, and Weakly SL).
Check out my NeurIPS 2019 invited talk (EMC^2 Workshop) on Dynamic Neural Networks
Older News

Research Highlights

Synthetic Data Pretraining

Dynamic Neural Networks

Multimodal Perception

Learning with Limited Labels

See my full list of publications

Research

Pre-training and Transfer from Synthetic Data

Synthetic

Procedural Image Programs for Representation Learning

M. Baradad, R. Chen, J. Wulff, T. Wang, R. Feris, A. Torralba, and P. Isola

NeurIPS 2022

[Paper] [Project Page] [Code]

How Transferable are Video Representations Based on Synthetic Data?

Y. Kim, S. Mishra, S. Jin, R. Panda, H. Kuehne, L. Karlinsky, V. Saligrama, K. Saenko, A. Oliva, and R. Feris

NeurIPS 2022, Dataset Track

[Paper] [Dataset]

Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data

S. Mishra, R. Panda, C. Phoo, C. Chen, L. Karlinsky, K. Saenko, V. Saligrama, and R. Feris

CVPR 2022

[Paper] [Project Page] [Code]

SimVQA: Exploring Simulated Environments for Visual Question Answering

P. Bonilla, H. Wu, L. Wang, R. Feris, and V. Ordonez

CVPR 2022

[Paper] [Code]

Dynamic

Dynamic Neural Networks for Efficient AI

Instead of relying on one-size-fits-all models, we are investigating dynamic neural networks that adaptively change computation depending on the input.

IA-RED^2: Interpretability-Aware Redundancy Reduction for Vision Transformers

Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva

NeurIPS 2021

[Paper] [Project Page] [Code]

Dynamic Network Quantization for Efficient Video Inference

X. Sun, R. Panda, C. Chen, A. Oliva, R. Feris, and K. Saenko

ICCV 2021

[Paper] [Project Page] [Code]

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

R. Panda, C. Chen, Q. Fan, X. Sun, K. Saenko, A. Oliva, and R. Feris

ICCV 2021

[Paper] [Project Page] [Code]

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Y. Meng, R. Panda, C. Lin, P. Sattigeri, L. Karlinsky, K. Saenko, A. Oliva, and R. Feris

ICLR 2021

[Paper] [Project Page] [Code]

VA-RED^2: Video Adaptive Redundancy Reduction

B. Pan, R. Panda, C. Fosco, C. Lin, A. Andonian, Y. Meng, K. Saenko, A. Oliva, and R. Feris

ICLR 2021

[Paper] [Project Page] [Code]

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Y. Meng, C. Lin, R. Panda, P. Sattigeri, L. Karlinsky, A. Oliva, K. Saenko, and R. Feris

ECCV 2020

[Paper] [Project Page] [Code] [MIT News]

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

X. Sun, R. Panda, R. Feris, and K. Saenko

NeurIPS 2020

[Paper] [Project Page] [Code]

SpotTune: Transfer Learning through Adaptive Fine-tuning

Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris

CVPR 2019

Top results on the Visual Decathlon challenge (2019)

[Paper] [Code]

BlockDrop: Dynamic Inference Paths in Residual Networks

Z. Wu*, T. Nagarajan*, A. Kumar, S. Rennie, L. Davis, K. Grauman, and R. Feris (* equal contribution)

CVPR 2018, Spotlight

[Paper] [Code]

Deep Learning with Limited Labeled Data

LwLL

I'm currently leading the IBM-MIT team as part of Darpa Learning with Less Labels (LwLL), together with Prof. Josh Tenenbaum

Highlight

A Broader Study of Cross-Domain Few-Shot Learning

Y. Guo, N. Codella, L. Karlinsky, J. Codella, J. Smith, K. Saenko, T. Rosing, and R. Feris

ECCV 2020

See also: CVPR VL3 Workshop and the challenge associated with our benchmark

[Paper] [Code and Data]

Few-shot Learning

Fine-grained Angular Contrastive Learning with Coarse Labels

G. Bukchin, E. Schwartz, K. Saenko, O. Shahar, R. Feris, R. Giryes, and L. Karlinsky

CVPR 2021, Oral

[Paper]

TAFSSL: Task-Adaptive Feature Sub-Space Learning for Few-shot Classification

M. Lichtenstein, P. Sattigeri, R. Feris, R. Giryes, and L. Karlinsky

ECCV 2020

[Paper] [Code]

RepMet: Representative-based Metric Learning for Classification and One-shot Object Detection

L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. Bronstein

CVPR 2019

See also: Our StarNet paper (AAAI 2021)

[Paper] [Code]

Transfer Learning and Adaptation

A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A. Islam, C. Chen, R. Panda, L. Karlinsky, R. Radke, and R. Feris

ICCV 2021

[Paper] [Code]

See our SpotTune (CVPR 2019) and our AdaShare paper (NeurIPS 2020) in the dynamic neural networks section

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

Z. Wang, M. Yu, Y. Wei, R. Feris, J. Xiong, W. Hwu, T. Huang, and H. Shi

CVPR 2020

[Paper] [Code]

Co-regularized Alignment for Unsupervised Domain Adaptation

A. Kumar, P. Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, W. T. Freeman, and G. Wornell

NeurIPS 2018

[Paper]

Data Augmentation

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

Z. Tang, Y. Gao, P. Sattigeri, L. Karlinsky, R. Feris, and D. Metaxas

ECCV 2020

[Paper] [Code]

LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning

A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, and A. Bronstein

CVPR 2019, Oral

[Paper] [Code]

Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein

NeurIPS 2018, Spotlight

[Paper] [Code]

Jointly optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation

X. Peng, Z. Tang, F. Yang, R. Feris, and D. Metaxas

CVPR 2018

[Paper] [Code]

S3Pool: Pooling with Stochastic Spatial Sampling

S. Zhai, H. Wu, A. Kumar, Y. Cheng, Y. Lu, Z. Zhang, and R. Feris

CVPR 2017

[Paper] [Code]

Multimodal Learning (Vision, Audio, Speech, Language) and Applications

Multimodal

Audio-Visual Learning

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

N. Shvetsova, B. Chen, A. Rouditchenko, S.Thomas, B. Kingsbury, R. Feris, D. Harwath, J. Glass, and H. Kuehne

CVPR 2022

[Paper]

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

B. Chen, A. Rouditchenko, K. Duarte, H. Kuehne, S. Thomas, A. Boggust, R. Panda, B. Kingsbury, R. Feris, D. Harwath, J. Glass, M. Picheny, and S.F. Chang

ICCV 2021

[Paper]

Cascaded Multilingual Audio-Visual Learning from Videos

A. Rouditchenko, A. Boggust, D. Harwath, S.Thomas, H. Kuehne, B. Chen, R. Panda, R. Feris, B. Kingsbury, M. Picheny, and James Glass

Interspeech 2021

[Paper] [Project Page] [Code]

See our AdaMML paper in the dynamic neural networks section

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

M. Monfort, S. Jin, A. Liu, D. Harwath, R. Feris, J. Glass, and A. Oliva

CVPR 2021

[Project Page] [Paper] [Data]

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

A. Rouditchenko, A. Boggust, D. Harwath, D. Joshi, S. Thomas, K. Audhkhasi, R. Feris, B. Kingsbury, M. Picheny, A. Torralba, and J. Glass

Interspeech 2021

[Paper] [Project Page] [Video Demo] [Code]

Automatic Curation of Sports Highlights using Multimodal Excitement Features

M. Merler, D. Joshi, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do, J. Smith, and R. Feris

IEEE Transactions on MultiMedia (TMM) 2019

Our system was used to produce the official highlights of the USOpen, Wimbledon, and Masters tournaments (and watched by millions of fans worldwide)

[Paper] [Blog] [Video Demo 1] [Video Demo 2] [New York Times] [Fortune] [Newsweek] [Engadget] [NBC News] [Behind the Code]

The Excitement of Sports: Automatic Highlights using Audio-Visual Cues

M. Merler, D. Joshi, K. Mac, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do , J. Smith, and R. Feris

CVPR Workshop on Sight and Sound, 2018.

[Paper] [Slides] [Video Demo 1] [Video Demo 2] [Blog] [Venturebeat] [ZDNet]

Learning to Separate Object Sounds by Watching Unlabeled Video

R. Gao, R. Feris, and K. Grauman

ECCV 2018, Oral

[Paper] [Project Page] [Code]

Vision and Language for Fashion

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris

CVPR 2021

[Paper] [Data]

I co-founded the Workshop on Computer Vision for Fashion, Art, and Design

Dialog-based Interactive Image Retrieval

X. Guo, H. Wu, Y. Cheng, S. Rennie, G. Tesauro and R. Feris

NeurIPS 2018

[Paper] [Code] [Video Demo]

Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network

J. Huang, R. Feris, Q. Chen, and S. Yan

ICCV 2015

[Paper] [Data]

Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes

Q. Chen, J. Huang, R. Feris, L. Brown, J. Dong, and S. Yan

CVPR 2015

[Paper]

Visual-Question Answering

Separating Skills and Concepts for Novel Visual Question Answering

S. Whitehead, H. Wu, H. Ji, R. Feris, and K. Saenko

CVPR 2021

[Paper]

Learning from Lexical Perturbations for Consistent Visual Question Answering

S. Whitehead, H. Wu, Y. Fung, H. Ji, R. Feris, and K. Saenko

Arxiv 2020

[Paper]

Egocentric Video + Geo-location + Weather

Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data

J. Wang, Y. Cheng, and R. Feris

CVPR 2016, Oral

[Paper]

Other Projects

Model Compression and Acceleration

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

C. Chen, Q. Fan, N. Mallinar, T. Sercu, and R. Feris

ICLR 2019

[Paper] [Code]

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi, and R. Feris

CVPR 2017, Spotlight

[Paper]

An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

Y. Cheng, F. Yu, R. Feris, S. Kumar, A. Choudhary, and S. F. Chang

ICCV 2015

[Paper]

More on Video: Action Recognition and Tracking

Semi-Supervised Action Recognition with Temporal Contrastive Learning

A. Singh, O. Chakraborty, A. Varshney, R. Panda, R. Feris, K. Saenko, and A. Das

CVPR 2021

[Paper]

See our efficient action recognition papers - Adafuse (ICLR 2021), VA-RED^2 (ICLR 2021), and AR-Net (ECCV 2020) in the dynamic neural networks section

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

C. Chen, R. Panda, K. Ramakrishnan, R. Feris, J. Cohn, A. Oliva, and Q. Fan

CVPR 2021

[Paper] [Code]

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

A. Andonian, C. Fosco, M. Monfort, A. Lee, R. Feris, C. Vondrick, and A. Oliva

ECCV 2020

[Paper] [Code] [Project Page] [MIT News]

Video Instance Segmentation Tracking

C. Lin, Y. Hung, R. Feris, and L. He

CVPR 2020

[Paper]

Learning Motion in Feature Space: Locally- Consistent Deformable Convolution Networks for Fine Grained Action Detection

M. Khoi-Nguyen, D. Joshi, R. Yeh, J. Xiong, R. Feris, and M. Do

ICCV 2019, Oral

[Paper] [Code] [Project Page]

Object Detection and Matching

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

B. Cheng, Y. Wei, H. She, R. Feris, J. Xiong, and T. Huang

ECCV 2018

DCR achieved state-of-the-art results on Pascal VOC and MS-COCO

[Paper] [Code]

A Unified Multi Scale Deep Convolutional Neural Network for Fast Object Detection

Z. Cai, Q. Fan, R. Feris, and N. Vasconcelos

ECCV 2016

MS-CNN achieved state-of-the-art results on the popular KITTI dataset

[Paper] [Code] [Demo] [KITTI results] [Project Page]

ICCV 2015 Tutorial on Tools for Efficient Object Detection (with Piotr Dollar, Xiaoyu Wang, Kaiming He, Ross Girshick, Rodrigo Benenson, and Jan Hosang).

Efficient Maximum Appearance Search for Large-Scale Object Detection

Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao, Z. Huang, and S. Yan

CVPR 2013

[Paper]

Shape Classification Through Structured Learning of Matching Measures

L. Chen, J. McAuley, R. Feris, T. Caetano, and M. Turk

CVPR 2009

[Paper] [Code]

Visual Attributes

Visual Attributes

R. Feris, C. Lampert, and D. Parikh

Advances in Computer Vision and Pattern Recognition, Springer, 2016

[Book Link]

Check out the Vision and Language for Fashion section for more papers on Visual Attributes: Fashion IQ (CVPR 2021), DARN (ICCV 2015), and DDAN (CVPR 2015)