Seong Joon Oh

I am a research scientist at NAVER AI Lab, working on the challenges of deploying machine learning models in the real world. I am interested in training reliable models (e.g. explainable, robust, and probabilistic models) and obtaining the necessary human supervision and guidance in a cost-effective way.

I received my PhD in computer vision and machine learning at Max-Planck Institute for Informatics in 2018, under the supervision of Bernt Schiele and Mario Fritz, with a focus on the privacy and security implications of CV and ML (Thesis). I received the Master of Mathematics with Distinction in 2014 and Bachelor of Arts in Mathematics as a Wrangler in 2013, both at University of Cambridge.

I started compiling the principles for life and research.

Email  /  Google Scholar  /  LinkedIn  /  Twitter  /  Github

profile photo
Updates

Research

I have tried to push certain fronts in ML research to make models truly useful and deployable in real life. They can be grouped into a few keywords.

Robustness. Changes in the input distribution shall not disrupt the model's predictive power. Ideally, a model should be robust against the shifts in input domain (e.g. natural and adversarial perturbations) and confounders (e.g. fairness).

Uncertainty. A model should know when it is going to get it wrong. This allows the users and downstream systems to make sensible and safe decisions based on the estimated confidence levels.

Human Annotation. An integral part of training a high-performance model is the human supervision. I have sought cost-effective ways to extract useful supervisory signals from humans.

Privacy & Security. There are different privacy and security angles with which ML can be analyzed. One may question the "stealability" of a black-box model as an IP; one may also question the privacy guarantees for user data in the federated learning setup. Still others may wonder whether certain level privacy is achievable at all on internet, with the increasing volume of user data online and more widespread use of machine learning algorithms to process such data.

Evaluation. Correct evaluation is undoubtably important in research and industrial applications, yet it is surprisingly difficult. I have cleaned up benchmarks and evaluation protocols in a few domains.

Large-Scale ML. Some of the methodologies I have been involved in are designed for large-scale ML. They typically require minimal changes to the original ML system but bring consistent gains across the board.

kim2021iccv Robustness Human Annotation Evaluation
Keep CALM and Improve Visual Feature Attribution
Jae Myung Kim*, Junsuk Choe*, Zeynep Akata, Seong Joon Oh.
*Equal contribution
ICCV, 2021
Bibtex / Code

It is difficult to find a CV researcher or practitioner who hasn't used (or at least heard of) the Class Activation Maps (CAM). It is a seminal feature attribution method that has left a deep mark on the vision research and applications. Notwithstanding its popularity, we found some practical and conceptual issues that makes CAM not as interpretable as it should be. We address the issues with a probabilistic treatment of the last layers of CNNs where the latent cue variable Z is trained via Marginal Likelihood (ML) or Expectation-Maximisation (EM) algorithms. The resulting Class Activation Latent Maps, or CALM, produces more precise and interpretable score maps.

heo2021iccv Large-Scale ML
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh.
ICCV, 2021
Bibtex / Code

The Tranformer architecture has successfully been adapted to visual models (e.g. ViT). However, Transformers, originally designed for language modelling, and ViT assign a constant ratio of computational loads between spatial and channel dimensions at different depths. We postulate this as a suboptimal design choice, as CNNs assign different ratios at different depths to maximise the utility of compute. We thus present Pooling-based Vision Transformer (PiT) that does this.

poli2021nha Robustness
Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions
Michael Poli*, Stefano Massaroli*, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, Jinkyoo Park, Animesh Garg.
*Equal contribution
arXiv, 2021
Bibtex

Recovering the dynamical systems, or the data generation process, behind time series data enables an effective and robust prediction, interpretation, and forecasting. There exist prior methods for recovering either continuous or discrete dynamics, but not the mixture. The underlying dynamics behind many real-world systems contain both continuous and discrete elements. For example, an aircraft essentially follows a continuous dynamics but goes through a discrete mode shift at touchdown. Such a system is referred to as a Stochastic Hybrid System (SHS). We present a framework that recovers SHS from time series data using ingredients like Neural ODEs and latent variable models.

yun2021cvpr Robustness Large-Scale ML
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun.
CVPR, 2021
Bibtex / Code

ImageNet labels contain lots of noise (e.g. Shankar et al.). There have been efforts to fix them on the evaluation set, but not yet on the training set. We fix them on the training set (published at codebase), but with the help of a bigger image classifier, to make the task feasible at all. This is another trick that will improve the ImageNet & downstream task accuracies across the board.

chun2021cvpr Uncertainty Evaluation
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus.
CVPR, 2021
Bibtex / Code

Given an image, there are many ways to describe it in text. Given a text description, there are likewise many possible images that suits the description. Cross-model associations are of many-to-many nature. The usual deterministic embeddings cannot model this well. We introduce a probabilistic embedding scheme based on the Hedged Instance Embedding (ICLR'19) to handle the many-to-many mapping gracefully. We address another crucial issue with evaluation: your method gets either penalised or rewarded for retrieving synonymous sentences. This is because of the non-exhaustive true matches in the eval set. Since ground-up collection of such matches is too expensive, we introduce a novel surrogate measure Plausible-Match R-Precision based on the estimated true matches.

choe2020cvpr Robustness Human Annotation Evaluation
Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets
Junsuk Choe*, Seong Joon Oh*, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim.
*Equal contribution
arXiv, 2020
Bibtex / Code / Slides / Tutorial video

Journal extension of CVPR'20! It now contains more analyses, including the evaluation of input gradient variants as Weakly-Supervised Object Localization (WSOL) methods.

heo2021iclr Large-Scale ML
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
Byeongho Heo*, Sanghyuk Chun*, Seong Joon Oh, Dongyoon Han, Youngjung Uh, Sangdoo Yun, Jungwoo Ha.
*Equal contribution
ICLR, 2021
Bibtex / Code / Project

When you apply a momentum-based optimizer over scale-invariant parameters, their norms increase quite a bit. The norm increase doesn't contribute anything to the loss minimization while only slowing down the convergence. We fix this by appending a projection operation on SGD and Adam. This leads to performance improvements across the board.

yun2020videomix Large-Scale ML
VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim.
arXiv, 2020
Bibtex

Data augmentation is not as extensively studied in the video recognition tasks as in the static image recognition domain. We study the extension of popular static-image augmentation method, such as CutMix, on video recognition tasks.

ferjad2020icml Evaluation
Reliable Fidelity and Diversity Metrics for Generative Models
Muhammad Ferjad Naeem*, Seong Joon Oh*, Youngjung Uh, Yunjey Choi, Jaejun Yoo.
*Equal contribution
ICML, 2020
Bibtex / Code / ICML Virtual / Youtube

Evaluating generative models is tricky. There are Inception Score and Fréchet Inception Distance measures indeed, and then (Improved) Precision and Recall metrics to separately examine the fidelity and diversity aspects. Yet, they are still not perfect. We address the issues with Improved Precision and Recall metrics and propose new metrics: Density and Coverage.

bahng2020icml Robustness Human Annotation Evaluation
Learning De-biased Representations with Biased Representations
Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, Seong Joon Oh.
ICML, 2020
Bibtex / Code / ICML Virtual / Youtube

Models pick up correlations, rather than causal mechanisms, between inputs and outputs. De-biasing (and fairness) researches have guided models on "which cues to look at" through explicit bias labels or by re-weighting or re-generating training data to remove bias. We show that, for many application scenarios, it is possible to encode the "cues to look at" through model architecture and such expensive strategies are no longer needed.

choe2020cvpr Robustness Human Annotation Evaluation
Evaluating Weakly-Supervised Object Localization Methods Right
Junsuk Choe*, Seong Joon Oh*, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim.
CVPR, 2020
Bibtex / Code / Slides / Tutorial video

I have long waited for this moment since CVPR'17. Weakly-Supervised Object Localization, or WSOL, has in fact been not weakly supervised in a strict sense. Design choices and hyperparameters are validated with the localization annotations! This paper explains why researchers had to rely on localization validation -- without localization supervision, there is no way to force a model to not extract cues from background regions. We propose a new fair benchmark acknowledging the need for localization annotations and show that WSOL methods since CAM in 2016 have not introduced much gain.

lee2020cvprw On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee.
CVPR Workshop, 2020
Bibtex

Scene text recognition works well, but there are remaining corner cases. An example is texts with unusual orientations and arrangements (e.g. BMW logo). We focus on this corner case and propose a model based on self-attention.

joon2015iccv Privacy & Security Evaluation
Person Recognition in Personal Photo Collections
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele.
TPAMI, 2020
Bibtex / Journal

Journal version of my first paper ICCV'15, after five years! We have developed the version two of the ICCV'15 system that outperforms the methods that have appeared in the meantime.

joon2018iclr Privacy & Security
Towards Reverse-Engineering Black-Box Neural Networks
Seong Joon Oh, Mario Fritz, Bernt Schiele.
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (book chapter), 2019
Bibtex / Book chapter

Book chapter version of ICLR'18! We build connections between our black-box inspection methodology and the explainable AI.

orekondy2019neuripsfl Privacy & Security
Gradient-Leaks: Understanding and Controlling Deanonymization in Federated Learning
Tribhuvanesh Orekondy, Seong Joon Oh, Yang Zhang, Bernt Schiele, Mario Fritz.
NeurIPS Workshop, 2019
Bibtex / Poster

Federated learning allows sensitive user data to never leave the device and still be used for training. It is considered a safer option than sending the user data directly to the server. But is it? We show that users may be identified and linked based on the model updates communicated between the device and server.

chun2019icmlw Robustness Uncertainty Evaluation
An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods
Sanghyuk Chun, Seong Joon Oh, Sangdoo Yun, Dongyoon Han, Junsuk Choe, Youngjoon Yoo.
ICML Workshop, 2019
Bibtex

There has been a line of research on simple regularization techniques like CutMix (ICCV'19) and other lines of research on robustness and uncertainty. We make a happy marriage of the two and measure how well the regularization techniques improve robustness and uncertainty of a model.

yun2019iccv Robustness Large-Scale ML
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo.
ICCV Oral Talk, 2019
Bibtex / Code / Talk / Poster / Blog / Project / Project

A simple solution that works surprisingly well! Cut and paste patches from other images during training. Quite likely, you will see a performance boost.

baek2019iccv Evaluation
What Is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee.
ICCV Oral Talk, 2019
Bibtex / Code

Scene text recognition field has long suffered from the lack of a unified agreement on the evaluation protocol. We provide a standard protocol. We also provide a unified view on the previous methods and discover a novel combination of existing modules that turns out to be the state of the art.

joon2019iclr Uncertainty
Modeling Uncertainty with Hedged Instance Embedding
Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher.
ICLR, 2019
Bibtex / Poster

There has been quite some work on representing uncertainty for classification or regression tasks. Is there a way to represent uncertainty for instance embedding models too? We show that it is possible to train probabilistic representatitons for instances based on their inherent ambiguity.

tretschk2018cscs Privacy & Security
Sequential Attacks on Agents for Long-Term Adversarial Goals
Edgar Tretschk, Seong Joon Oh, Mario Fritz.
ACM CSCS, 2018
Bibtex

Can a bad guy hijack an RL agent? We show that it is possible to let an agent pursue an alternative reward by introducing small adversarial perturbations in the input stream.

joon2018iclr Privacy & Security
Towards Reverse-Engineering Black-Box Neural Networks
Seong Joon Oh, Max Augustin, Mario Fritz, Bernt Schiele.
ICLR, 2018
Bibtex / Extended abstract / Poster / Code

Recipes for training a high-performance model are not cheap. Think about the GPU-and-research-scientist-and-engineer hours to find the right architectural components and optimizer hyperparameters. What if they can be stolen by examining the model responses to certain inputs?

sun2017cvpr Privacy & Security
Natural and Effective Obfuscation by Head Inpainting
Qianru Sun, Liqian Ma, Seong Joon Oh, Luc Van Gool, Bernt Schiele, Mario Fritz.
CVPR, 2018
Bibtex

Adversarial perturbation solutions (ICCV'17) produce visually pleasant protections with high protection rates, but their effects may be confined to a handful of recognition systems. We propose another solution based on face inpainting that changes the face to a fictitious yet natural-looking identity. It is effective against a broader set of recognition systems.

joon2017cvprw Privacy & Security
From Understanding to Controlling Privacy against Automatic Person Recognition in Social Media
Seong Joon Oh, Mario Fritz, Bernt Schiele.
CVPR Workshop, 2017
Bibtex / Poster

We stop and look back on the visual privacy papers (ICCV'15, ECCV'16, ICCV'17).

joon2017iccv Robustness Privacy & Security Evaluation
Adversarial Image Perturbation for Privacy Protection -- A Game Theory Perspective
Seong Joon Oh, Mario Fritz, Bernt Schiele.
ICCV, 2017
Bibtex / Poster / Code

If face blurring doesn't work (ECCV'16), how should we shield our personal photos online against recognition systems? We propose a solution based on adversarial perturbations and the game theoretic considerations for the evaluation therein.

joon2017cvpr Human Annotation
Exploiting Saliency for Object Segmentation from Image Level Labels
Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele.
CVPR, 2017
Bibtex / Poster / Code

There has been quite some work around training models for localization tasks (e.g. semantic segmentation) from the image tag supervision only. But is this fundamentally possible without relying on extensive validation with full localization annotations? We argue that certain priors are necessary at the very least to encode the extent of objects. Saliency, we argue, is a handy prior.

anja2017cvpr Generating Descriptions with Grounded and Co-Referenced People
Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, Bernt Schiele.
CVPR, 2017
Bibtex

We casually use pronouns to refer to others. For machines, however, referring to people with pronouns necessitates new types of data and training strategies to explicitly localize and link people across frames. We do that.

joon2016eccv Privacy & Security Evaluation
Faceless Person Recognition; Privacy Implications in Social Media
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele.
ECCV, 2016
Bibtex / Poster / Extended abstract

But can you still be recognized even with a blur on your face? Quite likely.

aditya2016mobisys Privacy & Security
I-pic: A Platform for Privacy-Compliant Image Capture
Paarijaat Aditya, Rijurekha Sen, Peter Druschel, Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele, Bobby Bhattacharjee, Tong Tong Wu.
MobiSys, 2016
Bibtex / Project

You are a janitor at Taj Mahal. Against you will, sightseers take photos with your face in the background. How can you opt out of being present in someone else's photo? We present a mobile-system based solution.

joon2015iccv Privacy & Security Evaluation
Person Recognition in Personal Photo Collections
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele.
ICCV, 2015
Bibtex / Poster / Video / Project

How well does a CNN model recognize people in personal photos? Even when people don't look at cameras, CNN finds out who they are, based on the context (e.g. location and social connections).

Academic activities
freepik image Robustness Human Annotation Evaluation Large-Scale ML
Workshop on ImageNet: Past, Present, and Future
Zeynep Akata, Lucas Beyer, Sanghyuk Chun, Almut Sophia Koepke, Diane Larlus, Seong Joon Oh, Rafael Sampaio de Rezende, Sangdoo Yun, ‪Xiaohua Zhai‬.
NeurIPS, 2021
Website

ImageNet symbolises the stellar achievements in ML and CV in the past decade. It has served as the go-to benchmark for model architectures and training techniques and as a common pre-training dataset for numerous downstream tasks. As of 2021, ImageNet is going through a creative destruction. As the SOTA models are saturating towards the upper bound of the benchmark, new versions of the benchmarks are being proposed (e.g. ImageNet-A/C/D/LT/O/P/R), with more focus on the reliability of models. Emerging fields in CV are now venturing beyond the ImageNet pre-training with class labels: e.g. self-supervision and language-description supervision. We believe now is a good time to discuss what’s next. The workshop will cover questions like: What are the main lessons learnt thanks to this benchmark? How can we reflect on the diverse requirements for good datasets and models, such as fairness, privacy, security, generalization, scale, and efficiency? What should the next generation of ImageNet-like benchmarks encompass? Through this workshop, we hope to collectively shape the landscape of the ML and CV research in the post-ImageNet era.

Human Annotation Evaluation
Tutorial on Weakly-Supervised Learning in Computer Vision
Hakan Bilen, Rodrigo Benenson, Seong Joon Oh.
ECCV, 2020
Website (slides)

Deeply-learned computer vision models are data-hungry and manual annotations are expensive. Can we train models with “weaker” annotations? This tutorial provides an overview of the vast literature on weakly supervised learning methods in computer vision. We also discuss the limitations of current state-of-the-art methods and evaluation metrics. We propose future research directions that hopefully will spur disruptive progress in weakly supervised learning.

Reviewing activities
  • Serving as a reviewer for CVPR, NeurIPS, ICML, ICCV, ECCV, ICLR, etc.
  • 5 x Best Reviewer Awards.
Awards
  • Samsung PhD Scholarship 2014-2018.
  • Vensi Thawani Prize 2014: For a distinctive achievement in mathematics.
  • William Pochin Scholarship 2014: For a distinctive achievement in mathematics.
  • William Pochin Scholarship 2013: For the First Class honour in mathematics.
  • Meritorious Winner 2012 at the Mathematical Contest in Modelling.
  • William Pochin Scholarship 2011: For the First Class honour in mathematics.
Talks

Template based on Jon Barron's website.