Rogerio Schmidt Feris

Principal Scientist and Manager

MIT-IBM Watson AI Lab

IBM Research

email: rsferis-at-us.ibm.com

I am a principal scientist and manager at the MIT-IBM Watson AI lab. My current work is particularly focused on deep learning methods that are label-efficient (learning with limited labels), sample-efficient (learning with less data), and computationally efficient. I am also interested in multimodal perception methods that combine vision, sound/speech, and language. 

 

I am passionate about doing fundamental research as well as developing systems that make a real-world impact. My work has not only been published in top AI conferences, but has also been integrated into multiple products, and covered by media outlets such as the New York Times, ABC News, and CBS 60 minutes. See my bio for more information about me.

News

 

Dynamic Neural Networks

with applications in Efficient Inference, Video Understanding, Transfer, and Multi-Task Learning

 

Instead of relying on one-size-fits-all models, we are investigating dynamic neural networks that adaptively change computation depending on the input.

adafuse.jpg

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

 

Y. Meng, R. Panda, C. Lin, P. Sattigeri, L. Karlinsky, K. Saenko, A. Oliva, and R. Feris

ICLR 2021

 

[Paper]​​ [Project Page] [Code]

vared2.jpg

VA-RED^2: Video Adaptive Redundancy Reduction

 

B. Pan, R. Panda, C. Fosco, C. Lin, A. Andonian, Y. Meng, K. Saenko, A. Oliva, and R. Feris

ICLR 2021

 

[Paper]​​ [Project Page] [Code]

arnet.jpg

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

 

Y. Meng, C. Lin, R. Panda, P. Sattigeri, L. Karlinsky, A. Oliva, K. Saenko, and R. Feris 

ECCV 2020

 

[Paper]​​ [Project Page] [Code] [MIT News]

adashare.jpg

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning 

 

X. Sun, R. Panda, R. Feris, and K. Saenko 

NeurIPS 2020

 

See also: Fully-adaptive Feature Sharing in Multi-Task Networks (CVPR 2017)

[Paper]​​ [Project Page] [Code]

spottune.jpg

SpotTune: Transfer Learning through Adaptive Fine-tuning

Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris

CVPR 2019

Top results on the Visual Decathlon challenge (2019)

[Paper] [Code]

BlockDrop: Dynamic Inference Paths in Residual Networks

 

Z. Wu*, T. Nagarajan*, A. Kumar, S. Rennie, L. Davis, K. Grauman, and R.  Feris (* equal contribution) 

CVPR 2018, Spotlight

 

[Paper]​​ [Code]

Deep Learning with Limited Labeled Data

 

I'm currently leading the IBM-MIT team as part of Darpa Learning with Less Labels (LwLL), together with Prof. Josh Tenenbaum

Highlight

cdfsl.jpg

A Broader Study of Cross-Domain Few-Shot Learning

 

Y. Guo, N. Codella, L. Karlinsky, J. Codella, J. Smith, K. Saenko, T. Rosing, and R. Feris

ECCV 2020

 

See also: CVPR VL3 Workshop and the challenge associated with our benchmark

[Paper]​​ [Code and Data]

Few-shot Learning

ancor.jpg

Fine-grained Angular Contrastive Learning with Coarse Labels

 

G. Bukchin, E. Schwartz, K. Saenko, O. Shahar, R. Feris, R. Giryes, and L. Karlinsky

CVPR 2021, Oral

 

[Paper]​​ 

tafssl.jpg

TAFSSL: Task-Adaptive Feature Sub-Space Learning for Few-shot Classification

 

M. Lichtenstein, P. Sattigeri, R. Feris, R. Giryes, and L. Karlinsky

ECCV 2020

 

[Paper]​​ [Code]

repmet5.jpg

RepMet: Representative-based Metric Learning for Classification and One-shot Object Detection

L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. Bronstein

CVPR 2019

 

See also: Our StarNet paper (AAAI 2021)

[Paper] [Code]

Transfer Learning and Adaptation

See our SpotTune (CVPR 2019)  and our AdaShare paper (NeurIPS 2020) in the dynamic neural networks section

stuff.jpg

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

Z. Wang, M. Yu, Y. Wei, R. Feris, J. Xiong, W. Hwu, T. Huang, and H. Shi

CVPR 2020

 

[Paper] [Code]

coregularized.jpg

Co-regularized Alignment for Unsupervised Domain Adaptation

A. Kumar, P. Sattigeri, K. Wadhawan, L. Karlinsky, R. Feris, W. T. Freeman, and G. Wornell

NeurIPS 2018

 

[Paper

Data Augmentation

onlineaugment.jpg

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

Z. Tang, Y. Gao, P. Sattigeri, L. Karlinsky, R. Feris, and D. Metaxas

ECCV 2020

 

[Paper] [Code]

laso.jpg

LaSO: Label-Set Operations Networks for Multi-label Few-shot Learning

A. Alfassy, L. Karlinsky, A. Aides, J. Shtok, S. Harary, R. Feris, R. Giryes, and A. Bronstein 

CVPR 2019, Oral

 

[Paper] [Code]

deltaencoder.jpg

Delta-Encoder: an Effective Sample Synthesis Method for Few-shot Object Recognition

E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein 

NeurIPS 2018, Spotlight

 

[Paper] [Code]

s3pool.jpg

S3Pool: Pooling with Stochastic Spatial Sampling

S. Zhai, H. Wu, A. Kumar, Y. Cheng, Y. Lu, Z. Zhang, and R. Feris 

CVPR 2017

 

[Paper] [Code]

Multimodal Learning (Vision, Audio, Speech, Language) and Applications

 

Audio-Visual Learning

Highlight

usopen2.jpg

Automatic Curation of Sports Highlights using Multimodal Excitement Features

M. Merler, D. Joshi, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do, J. Smith, and R. Feris 

IEEE Transactions on MultiMedia (TMM) 2019

Our system was used to produce the official highlights of the USOpen, Wimbledon, and Masters tournaments (and watched by millions of fans worldwide)

 

[Paper] [Blog]  [Video Demo 1] [Video Demo 2] [New York Times] [Fortune] [Newsweek] [Engadget] [NBC News] [Behind the Code

masters.jpg

The Excitement of Sports: Automatic Highlights using Audio-Visual Cues

M. Merler, D. Joshi, K. Mac, Q. Nguyen, S. Hammer, J. Kent, J. Xiong, M. Do , J. Smith, and R. Feris

CVPR Workshop on Sight and Sound, 2018.

 

[Paper] [Slides] [Video Demo 1] [Video Demo 2] [Blog] [Venturebeat] [ZDNet]

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

M. Monfort, S. Jin, A. Liu, D. Harwath, R. Feris, J. Glass, and A. Oliva

CVPR 2021

 

[Paper] [Data]

avlnet.jpg

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

A. Rouditchenko, A. Boggust, D. Harwath, D. Joshi, S. Thomas, K. Audhkhasi, R. Feris, B. Kingsbury, M. Picheny, A. Torralba, and J. Glass

Arxiv 2020

 

[Paper] [Video Demo]

separation.jpg

Learning to Separate Object Sounds by Watching Unlabeled Video

R. Gao, R. Feris, and K. Grauman

ECCV 2018, Oral

 

[Paper] [Project Page] [Code]

Vision and Language for Fashion

Picture1.jpg

Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback

H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris

CVPR 2021

 

[Paper] [Data]

dialog.jpg

Dialog-based Interactive Image Retrieval 

X. Guo, H. Wu, Y. Cheng, S. Rennie, G. Tesauro and R. Feris 

NeurIPS 2018 

 

[Paper] [Code] [Video Demo]

darn.jpg

Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network

J. Huang, R. Feris, Q. Chen, and S. Yan

ICCV 2015

 

[Paper] [Data

Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes

Q. Chen, J. Huang, R. Feris, L. Brown, J. Dong, and S. Yan 

CVPR 2015

 

[Paper

deepdomain.jpg

Visual-Question Answering

Separating Skills and Concepts for Novel Visual Question Answering

S. Whitehead, H. Wu, H. Ji, R. Feris, and K. Saenko

CVPR 2021

 

[Paper

lexical.jpg

Learning from Lexical Perturbations for Consistent Visual Question Answering

S. Whitehead, H. Wu, Y. Fung, H. Ji, R. Feris, and K. Saenko

Arxiv 2020

 

[Paper

Egocentric Video + Geo-location + Weather

Other Projects

Model Compression and Acceleration

biglittle.jpg

Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition

C. Chen, Q. Fan, N. Mallinar, T. Sercu, and R. Feris

ICLR 2019

 

[Paper] [Code]

fully.jpg

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification

Y. Lu, A. Kumar, S. Zhai, Y. Cheng, T. Javidi, and R. Feris 

CVPR 2017, Spotlight

 

[Paper

circulant.jpg

An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections

Y. Cheng, F. Yu, R. Feris, S. Kumar, A. Choudhary, and S. F. Chang 

ICCV 2015

 

[Paper

More on Video: Action Recognition and Tracking

tcl.jpg

Semi-Supervised Action Recognition with Temporal Contrastive Learning

A. Singh, O. Chakraborty, A. Varshney, R. Panda, R. Feris, K. Saenko, and A. Das

CVPR 2021

 

[Paper

See our efficient action recognition papers - Adafuse (ICLR 2021), VA-RED^2 (ICLR 2021), and AR-Net (ECCV 2020) in the dynamic neural networks section

analysis.jpg

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

C. Chen, R. Panda, K. Ramakrishnan, R. Feris, J. Cohn, A. Oliva, and Q. Fan

CVPR 2021

 

[Paper] [Code

abstraction.jpg

We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

A. Andonian, C. Fosco, M. Monfort, A. Lee, R. Feris, C. Vondrick, and A. Oliva 

ECCV 2020

 

[Paper] [Code] [Project Page] [MIT News]

tracking.jpg

Video Instance Segmentation Tracking

C. Lin, Y. Hung, R. Feris, and L. He 

CVPR 2020

 

[Paper

motion.jpg

Learning Motion in Feature Space: Locally- Consistent Deformable Convolution Networks for Fine Grained Action Detection

M. Khoi-Nguyen, D. Joshi, R. Yeh, J. Xiong, R. Feris, and M. Do

ICCV 2019, Oral

 

[Paper] [Code] [Project Page]

Object Detection and Matching

revisiting.jpg

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

B. Cheng, Y. Wei, H. She, R. Feris, J. Xiong, and T. Huang 

ECCV 2018

DCR achieved state-of-the-art results on Pascal VOC and MS-COCO

 

[Paper] [Code]

mscnn.jpg

A Unified Multi Scale Deep Convolutional Neural Network for Fast Object Detection

Z. Cai, Q. Fan, R. Feris, and N. Vasconcelos

ECCV 2016

 

MS-CNN achieved state-of-the-art results on the popular KITTI dataset

[Paper] [Code] [Demo] [KITTI results] [Project Page]

ICCV 2015 Tutorial on Tools for Efficient Object Detection (with Piotr Dollar, Xiaoyu Wang, Kaiming He, Ross Girshick, Rodrigo Benenson, and Jan Hosang).

emas.jpg

Efficient Maximum Appearance Search for Large-Scale Object Detection

Q. Chen, Z. Song, R. Feris, A. Datta, L. Cao, Z. Huang, and S. Yan

CVPR 2013

[Paper

shape.jpg

Shape Classification Through Structured Learning of Matching Measures

L. Chen, J. McAuley, R. Feris, T. Caetano, and M. Turk

CVPR 2009

[Paper] [Code

Visual Attributes

Visual Attributes

R. Feris, C. Lampert, and D. Parikh

Advances in Computer Vision and Pattern Recognition, Springer, 2016

[Book Link]

Check out the Vision and Language for Fashion section for more papers on Visual Attributes: Fashion IQ (CVPR 2021), DARN (ICCV 2015), and DDAN (CVPR 2015)

attributes.jpg

Designing Category-level Attributes for Discriminative Visual Recognition

F. Yu, L. Cao, R. Feris, J. Smith, and S. F. Chang

CVPR 2013

[Paper

I co-founded and organized the first, second, and third Workshop on Parts and Attributes

vehicle.jpg

Large-Scale Vehicle Detection, Indexing, and Search in Urban Surveillance Videos

R. Feris, B. Siddiquie, J. Petterson, Y. Zhai, A. Datta, L. Brown, and S. Pankanti

IEEE Transactions on Multimedia, 2012

See also: Feris et al, Attribute-based Vehicle Search in Crowded Surveillance Videos, ICMR 2011

[Paper] [Video Demos

multiattribute.jpg

Image Ranking and Retrieval Based on Multi-Attribute Queries

B. Siddiquie, R. Feris, and L. Davis

CVPR 2011, Oral

[Paper

Attribute-based People Search

R. S. Feris, R. Bobbit, L. Brown and S. Pankanti

ICMR 2014

See also:

[Paper] [Video Demo

Computational Photography

frequency.jpg

A Projector-Camera Setup for Geometry-Invariant Frequency Demultiplexing

D. Vaquero, R. Raskar, R. Feris, and M. Turk

CVPR 2009

[Paper

shadow.jpg

Characterizing the Shadow Space of Camera-Light Pairs

D. Vaquero, R. Feris, M. Turk, and R. Raskar

CVPR 2008

[Paper

stereo.jpg

Discontinuity Preserving Stereo with Small Baseline Multi-Flash Illumination

R. Feris, L. Chen, M. Turk, R. Raskar, and K. Tan

ICCV 2005, Oral 

See also: Feris et al, TPAMI 2007

[Paper] [Project Page] [Code] [Data]

interaction.jpg

Automatic Human Facial Illustrations with Variable Illumination

R. Feris and A. Olwal

SIGGRAPH Emerging Technologies, 2005 (Interactive Fogscreen)

[Project Page] [Code

specular.jpg

Specular Reflection Reduction with Multi-Flash Imaging

R. Feris, R. Raskar, K. Tan, and M. Turk

SIGGRAPH 2004 Poster

[Paper

fingerspelling.jpg

Exploiting Depth Discontinuities for Vision-based Fingerspelling Recognition

R. Feris, M. Turk, R. Raskar, K. Tan, and G. Ohashi

CVPR RTV4HCI Workshop 2004

[Paper

medical.jpg

Shape Enhanced Surgical Visualizations and Medical Illustrations with Multi-flash Imaging

K. Tan, J. Kobler, R. Feris, P. Dietz, and R. Raskar

MICCAI 2004

[Paper

Human Sensing

rednet.jpg

A Recurrent Encoder-Decoder Network for Sequential Face Alignment

X. Peng, R. Feris, X. Wang, and D. Metaxas

ECCV 2016

[Paper] [Code] [Project Page] [Video Demo]

tailored.jpg

Fast Face Detector Training Using Tailored Views

K. Scherbaum, R. Feris, J. Petterson, V. Blanz, and H. Seidel

ICCV 2013

[Paper

expression.jpg

Manifold-based Analysis of Facial Expression

Y Chang, C Hu, R Feris, and M. Turk

Image and Vision Computing, 2006

[Paper] [Video Demo

gazemaster.jpg

Hierarchical Wavelet Networks for Facial Feature Localization

R. Feris, J. Gemmell, K. Toyama, and V. Krueger

Face and Gesture Recognition 2002

Developed as part of the GazeMaster project for videoconferencing. I did this work during my internship at Microsoft Research in 2001.

[Paper] [IFA Head Pose Tracking Demo]

wavelet.jpg

Efficient Real-Time Face Tracking in Wavelet Subspace

R. Feris, V. Krueger and R. M. Cesar Jr.

ICCV RATFG-RTS Workshop, 2001

[Paper] [Video Demo 1] [Video Demo 2]

Recent Talks

  • ICML 2020 LatinX in AI Workshop. "Dynamic Neural Networks for Efficient Image and Video Classification" [SlidesLive Talk] [pdf]

  • CVPR 2020 DIRA Workshop. "Visual Learning Beyond Natural Images" [Talk] [pdf]

  • NeurIPS 2019 EMC^2 Workshop. "Dynamic Neural Networks for Efficient Inference" [SlidesLive Talk] [pdf]

  • CVPR 2019 FFSS-USAD Workshop. "Is it All Relative? Interactive Fashion Search based on Relative Natural Language Feedback” [pdf]

  • CVPR 2019 EMC^2 Workshop. “Speeding Up Deep Neural Networks with Adaptive Computation and Efficient Multi-Scale Architectures”[pdf]

  • CVPR 2019 Workshop on Learning from Imperfect Data. "Learning More from Less: Weak Supervision and Beyond" [pdf]

 

The postings on this site are my own and don't necessarily represent IBM's positions.