Seong Joon Oh

                          

I am a professor at the University of Tübingen leading the group on Scalable Trustworthy AI (STAI). In addition to my main job, I advise Parameter Lab. I am generally interested in training reliable models (e.g. explainable, robust, and probabilistic models) and obtaining the necessary human supervision and guidance in a cost-effective way.

I have been a research scientist at NAVER AI Lab for 3.5 years. I received my PhD in computer vision and machine learning at Max-Planck Institute for Informatics in 2018, under the supervision of Bernt Schiele and Mario Fritz, with a focus on the privacy and security implications of CV and ML (Thesis). I received the Master of Mathematics with Distinction in 2014 and Bachelor of Arts in Mathematics as a Wrangler in 2013, both at University of Cambridge.

I started compiling the principles for life and research 🍎.

Email  /  Google Scholar  /  LinkedIn  /  Twitter  /  Github

profile photo

Updates

Research

I have tried to push certain fronts in ML research to make models truly useful and deployable in real life. They can be grouped into a few keywords.

Robustness. Changes in the input distribution shall not disrupt the model's predictive power. Ideally, a model should be robust against the shifts in input domain (e.g. natural and adversarial perturbations) and confounders (e.g. fairness).

Uncertainty. A model should know when it is going to get it wrong. This allows the users and downstream systems to make sensible and safe decisions based on the estimated confidence levels.

Human Annotation. An integral part of training a high-performance model is the human supervision. I have sought cost-effective ways to extract useful supervisory signals from humans.

Privacy & Security. There are different privacy and security angles with which ML can be analyzed. One may question the "stealability" of a black-box model as an IP; one may also question the privacy guarantees for user data in the federated learning setup. Still others may wonder whether certain level privacy is achievable at all on internet, with the increasing volume of user data online and more widespread use of machine learning algorithms to process such data.

Explainability. Humans do not use systems that are not trustworthy. Humans thus find it hard to deal with systems that do not explain the rationale. Explanations are an integral part of trustworthiness. A model must provide a faithful reasoning for its decisions, ideally paving way to practical action items to improve the model.

Evaluation. Correct evaluation is undoubtably important in research and industrial applications, yet it is surprisingly difficult. I have cleaned up benchmarks and evaluation protocols in a few domains.

Large-Scale ML. Some of the methodologies I have been involved in are designed for large-scale ML. They typically require minimal changes to the original ML system but bring consistent gains across the board.

See slides and video (3 Aug 2022) for an overview of the past researches and future research ideas for the scalable trustworthy AI.

Publications

elisa2024tda
Explainability Evaluation
Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI
Elisa Nguyen, Johannes Bertram, Evgenii Kortukov, Jean Y Song, Seong Joon Oh.
arXiv, 2024
Bibtex

Explainable AI (XAI) has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. Despite efforts to correct this through user-focused studies from the HCI communities, we observe repeating patterns of formalism solutionism in a relatively young subfield of XAI: Training Data Attribution (TDA). We set out to correct this with a needfinding study with a diverse group of AI practitioners to identify potential user needs related to TDA. Our studies have uncovered new TDA tasks that are currently largely overlooked. We invite the TDA and XAI communities to consider these novel tasks and improve the user relevance of their research outcomes.

alex2024diversify
Robustness Uncertainty Evaluation Large-Scale ML
Scalable Ensemble Diversification for OOD Generalization and Detection
Alexander Rubinstein, Luca Scimeca, Damien Teney, Seong Joon Oh.
arXiv, 2024.
Bibtex

Ensemble diversification has traditionally been applied at sub-ImageNet scales (e.g. Waterbirds). We present methods to make them applicable at ImageNet+ scales. (1) Instead of relying on a separate OOD dataset to diversify the ensembles on, we source them from hard samples of the training set. (2) Stochastic pair selection. (3) Diversification of last 2 layers. We show that diversified ensembles are useful at OOD generalisation and (particularly) OOD detection, where we achieve the state-of-the-art performance.

evgenii2024ralm
Explainability Evaluation Large-Scale ML
Studying Large Language Model Behaviors Under Realistic Knowledge Conflicts
Evgenii Kortukov, Alexander Rubinstein, Elisa Nguyen, Seong Joon Oh.
CoLM, 2024.
Bibtex

Retrieval augmented generation (RAG) promises more trustworthy outputs from large language models (LLMs). RAG first retrieves relevant documents from a DB and includes them in the context for subsequent generation. However, RAG does not come with guarantee. Eventually, LLM decides whether to use the new information in retrieved document or to stick to the original information in the pre-training data. We present a study on this knowledge conflict.

balint2024disentanglement
Uncertainty Evaluation
Benchmarking Uncertainty Disentanglement: Specialized Uncertainties for Specialized Tasks
Bálint Mucsányi, Michael Kirchhof, Seong Joon Oh.
NeurIPS Datasets and Benchmarks Spotlight, 2024.
Bibtex

After the Trustworthy Machine Learning course, Bálint has investigated the relationships between different types of uncertainty in machine learning models. He found that many methods claiming to measure specific uncertainties had not been thoroughly verified. After the experiments, we concluded that these methods hardly achieved their claimed goals. This revelation is crucial for the uncertainty estimation community, where they try to understand and disentangle different uncertainty types.

kirchhof2024pretrained
Uncertainty Evaluation Large-Scale ML
Pretrained Visual Uncertainties
Michael Kirchhof, Mark Collier, Seong Joon Oh, Enkelejda Kasneci.
arXiv, 2024.
Bibtex

Uncertainty estimation so far had to be learned from scratch for each new task. We introduce a new approach that allows us to train uncertainty estimation on a large, general dataset and then apply it to new, specific tasks. We focus on practicality and efficiency. Our approach captures inherent uncertainty in the data, separate from uncertainty due to limited knowledge.

ankit2024star
Robustness Uncertainty
Do Deep Neural Network Solutions Form a Star Domain?
Ankit Sonthalia, Alexander Rubinstein, Ehsan Abbasnejad, Seong Joon Oh.
arXiv, 2024.
Bibtex

For deep neural networks, understanding the solution set, or the set of parameters with low loss values, is crucial. It has been conjectured that the solution set forms a convex set, modulo parameter permutations. The conjecture has met several counterexamples. Instead, we propose that the solution set forms a star domain: there exists a central "star model" connected to all other solutions. This is weaker and more relaxed than the convex-set conjecture, but does not contradict empirical findings.

martin2024trap
Privacy & Security Evaluation Large-Scale ML
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
Martin Gubri, Dennis Ulmer, Hwaran Lee, Sangdoo Yun, Seong Joon Oh.
ACL Findings, 2024.
Bibtex

Large language models (LLM) and surrounding services come with their own rules about who can use them and how they should be used. These rules are important to protect the company's work and to prevent misuse. Now, given a new LLM-based chatbot service, it's important to find out the underlying LLM in order to check the compliance with the rules attached to each LLM. Here's our method for doing this: We ask the chatbot a very specific question that only one company's machine will answer in a certain way. It's like asking a friend a secret question only they would know the answer to. If the machine answers the question the way we expect, we know it's based on a specific LLM.

dennis2024apricot
Uncertainty Large-Scale ML
Calibrating Large Language Models Using Their Generations Only
Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh.
ACL, 2024.
Bibtex

We can't trust large language model (LLM) outputs. One of the reasons is that it doesn't always generate reliable confidence estimates. One could look into the model likelihoods, but even that is infeasible for many black-box models. We show here that it's possible to train a lightweight external model to infer an LLM's internal confidence based only on the prompt and answers from the LLM (purely black box).

elisa2023neuripsxaiw
Explainability Evaluation
Exploring Practitioner Perspectives On Training Data Attribution Explanations
Elisa Nguyen, Evgenii Kortukov, Jean Y. Song, Seong Joon Oh.
NeurIPS XAI in Action Workshop, 2023.
Bibtex

Training data attribution (TDA) provides a non-parametric viewpoint for model explanations - which training data points are blamable for this test error? Apparently useful in practice, we realised that the actual usefulness is not tested in real applications. As a first step, we approach individuals working in a diverse array of sectors, either using or developing ML models, and ask whether they would find TDA useful in practice. The answer is affirmative - read the paper for more details.

balint2023tml
Robustness Uncertainty Human Annotation Explainability Evaluation Large-Scale ML
Trustworthy Machine Learning
Bálint Mucsányi, Michael Kirchhof, Elisa Nguyen, Alexander Rubinstein, Seong Joon Oh.
2023
Bibtex / Webpage / arXiv

The challenges posed by the trustworthiness of machine learning models are increasingly significant as these models find real-world applications. Our newly-released textbook, "Trustworthy Machine Learning," aims to address these challenges comprehensively. It covers four crucial dimensions: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. The text offers a thorough analysis of seminal and modern research papers, elucidating the foundational theories and practices. Originating from a course first offered at the University of Tübingen in the Winter Semester of 2022/23, the book serves as a stand-alone resource and includes code snippets and additional references. For further information, please visit our dedicated website.

elisa2023neurips
Uncertainty Explainability Evaluation
A Bayesian Perspective On Training Data Attribution
Elisa Nguyen, Minjoon Seo, Seong Joon Oh.
NeurIPS, 2023
Bibtex / Code

Consider Training Data Attribution (TDA) as a spotlight, highlighting the role each training sample plays in the predictions a model whips up. It's a tantalizing concept, especially for human-centric XAI, where it can guide users to tweak their training samples for better results. However, it's a bit like trying to hear a whisper in a storm. That's because the impact of removing a single training sample usually pales in comparison to the cacophony of noise stirred up during model training, like the random spark of model initialization or the chaotic dance of SGD batch shuffling. To understand this better, we've adopted a Bayesian deep learning viewpoint, treating our learned model as a Bayesian posterior and TDA estimates as random variables. Our findings? TDA is like trying to tune in to a radio station that's mostly static. It's really only effective in those rare instances when the impact of a single sample isn't lost in the noise. In those cases, TDA can indeed play a sweet tune!

siwon2023neurips
Privacy & Security Evaluation Large-Scale ML
ProPILE: Probing Privacy Leakage in Large Language Models
Siwon Kim, Sangdoo Yun, Hwaran Lee, Martin Gubri, Sungroh Yoon* Seong Joon Oh*.
*Equal contribution
NeurIPS Spotlight, 2023
Bibtex

Large language models (LLMs) are like giant sponges, soaking up vast amounts of data from the web. But amidst all that data, there could be some sensitive stuff, like personally identifiable information (PII). Makes you a bit worried, right? That's where our new tool, ProPILE, comes in. Think of it as a detective, helping people investigate if their personal data might be seeping out from these LLMs. You can create your own prompts based on your personal info to check how much of your PII are likely to be exposed to millions of users. ProPILE is one of our first efforts to empower data subjects to gain awareness and control over their own PII in the era of LLMs.

teney2023neurips
Robustness Evaluation
ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets
Damien Teney, Lin Yong, Seong Joon Oh, Ehsan Abbasnejad.
NeurIPS Spotlight, 2023
Bibtex

Several recent studies have reported positive correlations between in-distribution (ID) and out-of-distribution (OOD) generalisation performances. In particular, Wenzel et al. (2022) found that none of the 31k networks examined on 172 dataset pairs has shown a trade-off, or a negative correlation, between the ID and OOD performances. They further recommend that, to improve the OOD generalisation, one can instead focus on improving the ID generalisation. We argue that this may not always be true. We present counterexamples where one does observe a trade-off between ID and OOD generalisation. We point to the selection method for networks as the key reason for the contradicting observations. We alter the recommendation to the field in a more nuanced manner.

kirchhof2023neuripsdb
Uncertainty Evaluation
URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates
Michael Kirchhof, Bálint Mucsányi, Seong Joon Oh, Enkelejda Kasneci.
NeurIPS Datasets and Benchmarks, 2023
Bibtex / Code

NeurIPS D&B extension of the UAI Epistemic AI Workshop paper below.

kirchhof2023uaieai
Uncertainty Evaluation
URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates
Michael Kirchhof, Bálint Mucsányi, Seong Joon Oh, Enkelejda Kasneci.
UAI Epistemic AI Workshop Best Student Paper, 2023
Bibtex / Code

We developed the Uncertainty-aware Representation Learning (URL) benchmark in our research. This tool evaluates the reliability of uncertainty estimates from pretrained models on unseen datasets. Its implementation is simple, requiring only four lines of code. In our experiment, we applied URL to ten models trained on ImageNet. Then, we tested these models on eight different datasets. The results showed that achieving transferable uncertainty quantification remains a challenge. We invite the community to work on this novel problem!

elif2023arxiv
Evaluation Large-Scale ML
Playing repeated games with Large Language Models
Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, Eric Schulz.
arXiv, 2023
Bibtex

Imagine Large Language Models (LLMs) as digital diplomats, interacting with us and others in the cyber world. We set LLMs - GPT-3, GPT-3.5, and GPT-4 - against each other in games to understand their social behavior. LLMs are great when self-interest rules, like in the Prisoner's Dilemma, but stumble when coordination is key. GPT-4, for instance, acts tough in the Prisoner's Dilemma and struggles with simple conventions in the Battle of the Sexes. But, give GPT-4 more info or ask it to predict the opponent's move, and it adjusts its strategy. Our insights open up an exciting path towards a behavioral game theory for machines!

han2023iccv
Robustness Uncertainty Human Annotation Explainability Large-Scale ML
Neglected Free Lunch -- Learning Image Classifiers Using Annotation Byproducts
Dongyoon Han*, Junsuk Choe*, Dante Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh.
*Equal contribution
ICCV, 2023
Bibtex / Code / Youtube / Poster / ImageNet-AB (HuggingFace) / COCO-AB (HuggingFace)
ImageNet annotation tool / COCO annotation tool

Supervised learning trains models with (X,Y) data. The (X,Y) data comes from the annotation procedure where annotators provide the correct Y for each X. But behind the scene, annotators generate much more data than the (X,Y) data themselves: they unintionally generate auxiliary information during the annotation task, such as the history of corrections and the time-series of mouse traces and clicks. We call them annotation byproducts (AB) Z. We propose the new paradigm of learning using annotation byproducts (LUAB), where models are trained with the triplets (X,Y,Z) involving the ABs. We reproduce the original annotation procedures for ImageNet and COCO to generate AB-enriched datasets: ImageNet-AB and COCO-AB. we show that the auxiliary Z may help models be better aligned with human recognition mechanisms, leading to improved model robustness.

nam2023iccv
Large-Scale ML
Scratching Visual Transformer's Back with Uniform Attention
Hyeon-Woo Nam, Yu-Ji Kim, Byeongho Heo, Dongyoon Han, Seong Joon Oh, Tae-Hyun Oh.
ICCV, 2023
Bibtex / Code

ViT’s itchy point seems to be the uniform attention. ViTs are hungry for denser connections, yet dense connections are hard to achieve because of softmax's steep gradient around the uniform attention. We manually insert additional uniform attention layers in ViT models. This is very cheap! It turns out to be an effective trick for increasing the capacity and generalisation for ViT models, especially for the smaller versions.

kirchhof2023icml
Uncertainty
Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs
Michael Kirchhof, Enkelejda Kasneci, Seong Joon Oh.
ICML, 2023
Bibtex / Code

We finally came up with some theoretical guarantees for probabilistic embeddings! Given a spherical embedding space with a von-Mises-Fisher (vMF) family of true latent embedding distribution, one may identify the true latent vMF for every data point up to rotations with a Monte-Carlo version of InfoNCE (called MCInfoNCE). This result is a probabilistic extension of the work by Zimmerman et al.

hwang2022neurips
Robustness
SelecMix: Debiased Learning by Contradicting-pair Sampling
Inwoo Hwang, Sangjun Lee, Yunhyeok Kwak, Seong Joon Oh, Damien Teney, Jin-Hwa Kim, Byoung-Tak Zhang.
NeurIPS, 2022
Bibtex / Workshop paper

A classifier gets biased when its decision boundary separates the bias attribute (e.g. gender attribute for profession prediction). Some prior de-biasing methods correct the decision boundary by identifying the bias-conflicting samples in the training data (e.g. female mechanical engineers) and giving more weight on them. We go one step further. We argue that it's more effective to augment the whole convex hull between usual data points (e.g. male mechanical engineers) and bias-conflicting samples (e.g. female mechanical engineers). We do this through simple Mixup. It effectively de-biases a model, even in the presence of strong label noise, arguably the greatest arch-enemy for a de-biasing method.

chun2022eccv
Human Annotation Evaluation Large-Scale ML
ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
Sanghyuk Chun, Wonjae Kim, Song Park, Minsuk Chang, Seong Joon Oh.
ECCV, 2022
Bibtex / Code / Slides (long) / Slides (short)

Image-captioning benchmarks such as COCO Captions contain lots of nonsense. For the same image on the left, the caption that goes "Playing tennis with a racket" is deemed correct, while "Swinging a tennis racket" is penalised. This comes from the erratic recipe for constructing the datasets: (1) let annotators write down 5 captions per image and (2) consider only those 5 captions to be correct matches. We show that this practice introduces a lot of noise in the evaluation benchmarks. We then introduce a novel image-captioning dataset based on the MS-COCO Captions that captures the model performances more precisely.

kim2022icml
Human Annotation Large-Scale ML
Dataset Condensation via Efficient Synthetic-Data Parameterization
Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, Hyun Oh Song.
ICML, 2022
Bibtex / Code

Dataset condensation is the art of compactifying a training dataset. The aim is that a model trained on a condensed dataset is similar to the one trained on the original dataset, most importantly in terms of model accuracy (e.g. 91%-accuracy MNIST classifier with only 1 sample per class). We introduce many practical tricks to make data condensation work beyond the toy setting. We present the first data condensation method that actually works on images with sizes as large as 224x224, instead of 32x32!

lee2022cvpr
Robustness Human Annotation Explainability
Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data
Jungbeom Lee, Seong Joon Oh, Sangdoo Yun, Junsuk Choe, Eunji Kim, Sungroh Yoon.
CVPR, 2022
Bibtex / Code

Weakly-supervised semantic segmentation (WSSS) is the task of solving pixel-wise class assignment with only the image-level supervision. The problem is ill-posed because the image-level labels alone do not let models distinguish foreground (FG) objects (e.g. train) from spuriously-correlated background (BG) cues (e.g. rail). Researchers have sought external sources of information, such as shape prior, to address the ill-posedness. In this paper, we explore a novel source: BG images (e.g. rail images without a train). Conceptually, telling models what are not the FG cues is equivalent to telling them what actually are the FG cues; BG images are sufficient for turning the problem into a well-posed one. Collecting such BG data is cost-efficient, requiring orders of magnitude less annotation costs than the already-cheap image-level labels.

scimeca2022iclr
Robustness Explainability
Which Shortcut Cues Will DNNs Choose? A Study from the Parameter-Space Perspective
Luca Scimeca*, Seong Joon Oh*, Sanghyuk Chun, Michael Poli, Sangdoo Yun.
*Equal contribution
ICLR, 2022
Bibtex

Shortcut learning is emerging as a key limitation of the current generation of machine learning models (CVPR'20, ICML'20). In this work, instead of proposing yet another solution, we take a step back and deepen our understanding of the problem. For example, trained on a dataset where both colour and shape are valid cues for recognising the object, models of different types (MLP, CNN, and ViT) choose to use colour over shape. Why is that? We provide an explanation from the parameter-space perspective. Read the paper. Worth it!

hazel2022aaai
Robustness Human Annotation
ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification
Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han.
AAAI, 2022
Bibtex

This is an NLP paper. There have been many attempts at enlarging the training text data for few-shot text classification, like back-translation (e.g. En-Fr-En) and the use of pre-trained language models. Unlike those, we propose an augmentation method that is fully aware of the underlying grammatical structure of the sentence. Importantly, our method generates a set of synonymous sentences that are both grammatically correct and grammatically diverse! Here we gain quite some points in few-shot text classification benchmarks. Another contribution is viewing the train-val split as part of the method and seeking the best splitting strategy when data augmentation is being used. It turns out that splitting the few-shot labelled samples S into disjoint train-val splits (train split is then augmented) is sub-optimal; a better strategy is to use the augmented source data S' as the train split and the original S itself as the validation split!

choe2020cvpr
Robustness Human Annotation Explainability Evaluation
Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets
Junsuk Choe*, Seong Joon Oh*, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim.
*Equal contribution
TPAMI, 2022
Bibtex / Code / Slides / Tutorial video

Journal extension of CVPR'20! It now contains more analyses, including the evaluation of input gradient variants as Weakly-Supervised Object Localization (WSOL) methods.

kim2021iccv
Robustness Human Annotation Explainability Evaluation
Keep CALM and Improve Visual Feature Attribution
Jae Myung Kim*, Junsuk Choe*, Zeynep Akata, Seong Joon Oh.
*Equal contribution
ICCV, 2021
Bibtex / Code

It is difficult to find a CV researcher or practitioner who hasn't used (or at least heard of) the Class Activation Maps (CAM). It is a seminal feature attribution method that has left a deep mark on the vision research and applications. Notwithstanding its popularity, we found some practical and conceptual issues that makes CAM not as interpretable as it should be. We address the issues with a probabilistic treatment of the last layers of CNNs where the latent cue variable Z is trained via Marginal Likelihood (ML) or Expectation-Maximisation (EM) algorithms. The resulting Class Activation Latent Maps, or CALM, produces more precise and interpretable score maps.

heo2021iccv
Large-Scale ML
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh.
ICCV, 2021
Bibtex / Code

The Tranformer architecture has successfully been adapted to visual models (e.g. ViT). However, Transformers, originally designed for language modelling, and ViT assign a constant ratio of computational loads between spatial and channel dimensions at different depths. We postulate this as a suboptimal design choice, as CNNs assign different ratios at different depths to maximise the utility of compute. We thus present Pooling-based Vision Transformer (PiT) that does this.

poli2021neurips
Robustness
Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions
Michael Poli*, Stefano Massaroli*, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, Jinkyoo Park, Animesh Garg.
*Equal contribution
NeurIPS, 2021
Bibtex

Recovering the dynamical systems, or the data generation process, behind time series data enables an effective and robust prediction, interpretation, and forecasting. There exist prior methods for recovering either continuous or discrete dynamics, but not the mixture. The underlying dynamics behind many real-world systems contain both continuous and discrete elements. For example, an aircraft essentially follows a continuous dynamics but goes through a discrete mode shift at touchdown. Such a system is referred to as a Stochastic Hybrid System (SHS). We present a framework that recovers SHS from time series data using ingredients like Neural ODEs and latent variable models.

yun2021cvpr
Robustness Large-Scale ML
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun.
CVPR, 2021
Bibtex / Code

ImageNet labels contain lots of noise (e.g. Shankar et al.). There have been efforts to fix them on the evaluation set, but not yet on the training set. We fix them on the training set (published at codebase), but with the help of a bigger image classifier, to make the task feasible at all. This is another trick that will improve the ImageNet & downstream task accuracies across the board.

chun2021cvpr
Uncertainty Evaluation
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun, Seong Joon Oh, Rafael Sampaio de Rezende, Yannis Kalantidis, Diane Larlus.
CVPR, 2021
Bibtex / Code

Given an image, there are many ways to describe it in text. Given a text description, there are likewise many possible images that suits the description. Cross-model associations are of many-to-many nature. The usual deterministic embeddings cannot model this well. We introduce a probabilistic embedding scheme based on the Hedged Instance Embedding (ICLR'19) to handle the many-to-many mapping gracefully. We address another crucial issue with evaluation: your method gets either penalised or rewarded for retrieving synonymous sentences. This is because of the non-exhaustive true matches in the eval set. Since ground-up collection of such matches is too expensive, we introduce a novel surrogate measure Plausible-Match R-Precision based on the estimated true matches.

heo2021iclr
Large-Scale ML
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
Byeongho Heo*, Sanghyuk Chun*, Seong Joon Oh, Dongyoon Han, Youngjung Uh, Sangdoo Yun, Jungwoo Ha.
*Equal contribution
ICLR, 2021
Bibtex / Code / Project

When you apply a momentum-based optimizer over scale-invariant parameters, their norms increase quite a bit. The norm increase doesn't contribute anything to the loss minimization while only slowing down the convergence. We fix this by appending a projection operation on SGD and Adam. This leads to performance improvements across the board.

yun2020videomix
Large-Scale ML
VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim.
arXiv, 2020
Bibtex

Data augmentation is not as extensively studied in the video recognition tasks as in the static image recognition domain. We study the extension of popular static-image augmentation method, such as CutMix, on video recognition tasks.

ferjad2020icml
Evaluation
Reliable Fidelity and Diversity Metrics for Generative Models
Muhammad Ferjad Naeem*, Seong Joon Oh*, Youngjung Uh, Yunjey Choi, Jaejun Yoo.
*Equal contribution
ICML, 2020
Bibtex / Code / ICML Virtual / Youtube

Evaluating generative models is tricky. There are Inception Score and Fréchet Inception Distance measures indeed, and then (Improved) Precision and Recall metrics to separately examine the fidelity and diversity aspects. Yet, they are still not perfect. We address the issues with Improved Precision and Recall metrics and propose new metrics: Density and Coverage.

bahng2020icml
Robustness Human Annotation Evaluation
Learning De-biased Representations with Biased Representations
Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, Seong Joon Oh.
ICML, 2020
Bibtex / Code / ICML Virtual / Youtube

Models pick up correlations, rather than causal mechanisms, between inputs and outputs. De-biasing (and fairness) researches have guided models on "which cues to look at" through explicit bias labels or by re-weighting or re-generating training data to remove bias. We show that, for many application scenarios, it is possible to encode the "cues to look at" through model architecture and such expensive strategies are no longer needed.

choe2020cvpr
Robustness Human Annotation Explainability Evaluation
Evaluating Weakly-Supervised Object Localization Methods Right
Junsuk Choe*, Seong Joon Oh*, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim.
CVPR, 2020
Bibtex / Code / Slides / Tutorial video

I have long waited for this moment since CVPR'17. Weakly-Supervised Object Localization, or WSOL, has in fact been not weakly supervised in a strict sense. Design choices and hyperparameters are validated with the localization annotations! This paper explains why researchers had to rely on localization validation -- without localization supervision, there is no way to force a model to not extract cues from background regions. We propose a new fair benchmark acknowledging the need for localization annotations and show that WSOL methods since CAM in 2016 have not introduced much gain.

lee2020cvprw
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee.
CVPR Workshop, 2020
Bibtex

Scene text recognition works well, but there are remaining corner cases. An example is texts with unusual orientations and arrangements (e.g. BMW logo). We focus on this corner case and propose a model based on self-attention.

joon2015iccv
Privacy & Security Evaluation
Person Recognition in Personal Photo Collections
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele.
TPAMI, 2020
Bibtex / Journal

Journal version of my first paper ICCV'15, after five years! We have developed the version two of the ICCV'15 system that outperforms the methods that have appeared in the meantime.

joon2018iclr
Privacy & Security
Towards Reverse-Engineering Black-Box Neural Networks
Seong Joon Oh, Mario Fritz, Bernt Schiele.
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (book chapter), 2019
Bibtex / Book chapter

Book chapter version of ICLR'18! We build connections between our black-box inspection methodology and the explainable AI.

orekondy2019neuripsfl
Privacy & Security
Gradient-Leaks: Understanding and Controlling Deanonymization in Federated Learning
Tribhuvanesh Orekondy, Seong Joon Oh, Yang Zhang, Bernt Schiele, Mario Fritz.
NeurIPS Workshop, 2019
Bibtex / Poster

Federated learning allows sensitive user data to never leave the device and still be used for training. It is considered a safer option than sending the user data directly to the server. But is it? We show that users may be identified and linked based on the model updates communicated between the device and server.

chun2019icmlw
Robustness Uncertainty Evaluation
An Empirical Evaluation on Robustness and Uncertainty of Regularization Methods
Sanghyuk Chun, Seong Joon Oh, Sangdoo Yun, Dongyoon Han, Junsuk Choe, Youngjoon Yoo.
ICML Workshop, 2019
Bibtex

There has been a line of research on simple regularization techniques like CutMix (ICCV'19) and other lines of research on robustness and uncertainty. We make a happy marriage of the two and measure how well the regularization techniques improve robustness and uncertainty of a model.

yun2019iccv
Robustness Large-Scale ML
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo.
ICCV Oral Talk, 2019
Bibtex / Code / Talk / Poster / Blog / Project / Project

A simple solution that works surprisingly well! Cut and paste patches from other images during training. Quite likely, you will see a performance boost.

baek2019iccv
Evaluation
What Is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee.
ICCV Oral Talk, 2019
Bibtex / Code

Scene text recognition field has long suffered from the lack of a unified agreement on the evaluation protocol. We provide a standard protocol. We also provide a unified view on the previous methods and discover a novel combination of existing modules that turns out to be the state of the art.

joon2019iclr
Uncertainty
Modeling Uncertainty with Hedged Instance Embedding
Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher.
ICLR, 2019
Bibtex / Poster

There has been quite some work on representing uncertainty for classification or regression tasks. Is there a way to represent uncertainty for instance embedding models too? We show that it is possible to train probabilistic representatitons for instances based on their inherent ambiguity.

tretschk2018cscs
Privacy & Security
Sequential Attacks on Agents for Long-Term Adversarial Goals
Edgar Tretschk, Seong Joon Oh, Mario Fritz.
ACM CSCS, 2018
Bibtex

Can a bad guy hijack an RL agent? We show that it is possible to let an agent pursue an alternative reward by introducing small adversarial perturbations in the input stream.

joon2018iclr
Privacy & Security
Towards Reverse-Engineering Black-Box Neural Networks
Seong Joon Oh, Max Augustin, Mario Fritz, Bernt Schiele.
ICLR, 2018
Bibtex / Extended abstract / Poster / Code

Recipes for training a high-performance model are not cheap. Think about the GPU-and-research-scientist-and-engineer hours to find the right architectural components and optimizer hyperparameters. What if they can be stolen by examining the model responses to certain inputs?

sun2017cvpr
Privacy & Security
Natural and Effective Obfuscation by Head Inpainting
Qianru Sun, Liqian Ma, Seong Joon Oh, Luc Van Gool, Bernt Schiele, Mario Fritz.
CVPR, 2018
Bibtex

Adversarial perturbation solutions (ICCV'17) produce visually pleasant protections with high protection rates, but their effects may be confined to a handful of recognition systems. We propose another solution based on face inpainting that changes the face to a fictitious yet natural-looking identity. It is effective against a broader set of recognition systems.

joon2017cvprw
Privacy & Security
From Understanding to Controlling Privacy against Automatic Person Recognition in Social Media
Seong Joon Oh, Mario Fritz, Bernt Schiele.
CVPR Workshop, 2017
Bibtex / Poster

We stop and look back on the visual privacy papers (ICCV'15, ECCV'16, ICCV'17).

joon2017iccv
Robustness Privacy & Security Evaluation
Adversarial Image Perturbation for Privacy Protection -- A Game Theory Perspective
Seong Joon Oh, Mario Fritz, Bernt Schiele.
ICCV, 2017
Bibtex / Poster / Code

If face blurring doesn't work (ECCV'16), how should we shield our personal photos online against recognition systems? We propose a solution based on adversarial perturbations and the game theoretic considerations for the evaluation therein.

joon2017cvpr
Human Annotation Explainability
Exploiting Saliency for Object Segmentation from Image Level Labels
Seong Joon Oh, Rodrigo Benenson, Anna Khoreva, Zeynep Akata, Mario Fritz, Bernt Schiele.
CVPR, 2017
Bibtex / Poster / Code

There has been quite some work around training models for localization tasks (e.g. semantic segmentation) from the image tag supervision only. But is this fundamentally possible without relying on extensive validation with full localization annotations? We argue that certain priors are necessary at the very least to encode the extent of objects. Saliency, we argue, is a handy prior.

anja2017cvpr
Generating Descriptions with Grounded and Co-Referenced People
Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, Bernt Schiele.
CVPR, 2017
Bibtex

We casually use pronouns to refer to others. For machines, however, referring to people with pronouns necessitates new types of data and training strategies to explicitly localize and link people across frames. We do that.

joon2016eccv
Privacy & Security Evaluation
Faceless Person Recognition; Privacy Implications in Social Media
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele.
ECCV, 2016
Bibtex / Poster / Extended abstract

But can you still be recognized even with a blur on your face? Quite likely.

aditya2016mobisys
Privacy & Security
I-pic: A Platform for Privacy-Compliant Image Capture
Paarijaat Aditya, Rijurekha Sen, Peter Druschel, Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele, Bobby Bhattacharjee, Tong Tong Wu.
MobiSys, 2016
Bibtex / Project

You are a janitor at Taj Mahal. Against you will, sightseers take photos with your face in the background. How can you opt out of being present in someone else's photo? We present a mobile-system based solution.

joon2015iccv
Privacy & Security Evaluation
Person Recognition in Personal Photo Collections
Seong Joon Oh, Rodrigo Benenson, Mario Fritz, Bernt Schiele.
ICCV, 2015
Bibtex / Poster / Video / Project

How well does a CNN model recognize people in personal photos? Even when people don't look at cameras, CNN finds out who they are, based on the context (e.g. location and social connections).

Academic activities

compass
Robustness Uncertainty Human Annotation Explainability Evaluation Large-Scale ML
Trustworthy Machine Learning
Seong Joon Oh, Johannes Bertram, Ankit Sonthalia, Lennart Bramlage.
Winter 24/25 @ University of Tübingen
Website / Book

As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalise well to small changes in the distribution; some models are found to utilise sensitive features that could treat certain demographic user groups unfairly; models tend to be confident on novel types of data; models cannot communicate the rationale behind their decisions effectively with the end users like medical staff to maximise the human-machine synergies. Collectively, we face a trustworthiness issue with the current machine learning technology. A large fraction of the machine learning research nowadays is dedicated to expanding the frontier of Trustworthy Machine Learning (TML). The course covers a theoretical and technical background for key topics in TML. We conduct a critical review of important classical and contemporary research papers on related topics and provide hands-on practicals to implement TML techniques.

compass
Robustness Uncertainty Human Annotation Explainability Evaluation Large-Scale ML
Trustworthy Machine Learning
Seong Joon Oh, Arnas Uselis, Bálint Mucsányi, Evgenii Kortukov.
Winter 23/24 @ University of Tübingen
Website / Book

As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalise well to small changes in the distribution; some models are found to utilise sensitive features that could treat certain demographic user groups unfairly; models tend to be confident on novel types of data; models cannot communicate the rationale behind their decisions effectively with the end users like medical staff to maximise the human-machine synergies. Collectively, we face a trustworthiness issue with the current machine learning technology. A large fraction of the machine learning research nowadays is dedicated to expanding the frontier of Trustworthy Machine Learning (TML). The course covers a theoretical and technical background for key topics in TML. We conduct a critical review of important classical and contemporary research papers on related topics and provide hands-on practicals to implement TML techniques.

freepik image
Robustness Uncertainty Human Annotation Explainability Evaluation Large-Scale ML
Trustworthy Machine Learning
Seong Joon Oh, Elisa Nguyen, Alexander Rubinstein, Elif Akata, Michael Kirchhof.
Winter 22/23 @ University of Tübingen
Website / Book

As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalise well to small changes in the distribution; some models are found to utilise sensitive features that could treat certain demographic user groups unfairly; models tend to be confident on novel types of data; models cannot communicate the rationale behind their decisions effectively with the end users like medical staff to maximise the human-machine synergies. Collectively, we face a trustworthiness issue with the current machine learning technology. A large fraction of the machine learning research nowadays is dedicated to expanding the frontier of Trustworthy Machine Learning (TML). The course covers a theoretical and technical background for key topics in TML. We conduct a critical review of important classical and contemporary research papers on related topics and provide hands-on practicals to implement TML techniques.

freepik image
Robustness Human Annotation Evaluation Large-Scale ML
Workshop on ImageNet: Past, Present, and Future
Zeynep Akata, Lucas Beyer, Sanghyuk Chun, Almut Sophia Koepke, Diane Larlus, Seong Joon Oh, Rafael Sampaio de Rezende, Sangdoo Yun, ‪Xiaohua Zhai‬.
NeurIPS, 2021
Website

ImageNet symbolises the stellar achievements in ML and CV in the past decade. It has served as the go-to benchmark for model architectures and training techniques and as a common pre-training dataset for numerous downstream tasks. As of 2021, ImageNet is going through a creative destruction. As the SOTA models are saturating towards the upper bound of the benchmark, new versions of the benchmarks are being proposed (e.g. ImageNet-A/C/D/LT/O/P/R), with more focus on the reliability of models. Emerging fields in CV are now venturing beyond the ImageNet pre-training with class labels: e.g. self-supervision and language-description supervision. We believe now is a good time to discuss what’s next. The workshop will cover questions like: What are the main lessons learnt thanks to this benchmark? How can we reflect on the diverse requirements for good datasets and models, such as fairness, privacy, security, generalization, scale, and efficiency? What should the next generation of ImageNet-like benchmarks encompass? Through this workshop, we hope to collectively shape the landscape of the ML and CV research in the post-ImageNet era.

Human Annotation Explainability Evaluation
Tutorial on Weakly-Supervised Learning in Computer Vision
Hakan Bilen, Rodrigo Benenson, Seong Joon Oh.
ECCV, 2020
Website (slides)

Deeply-learned computer vision models are data-hungry and manual annotations are expensive. Can we train models with “weaker” annotations? This tutorial provides an overview of the vast literature on weakly supervised learning methods in computer vision. We also discuss the limitations of current state-of-the-art methods and evaluation metrics. We propose future research directions that hopefully will spur disruptive progress in weakly supervised learning.

Reviewing activities
  • Serving as an area chair for NeurIPS & CVPR.
  • Serving as a reviewer for CVPR, NeurIPS, ICML, ICCV, ECCV, ICLR, etc.
  • 6 x Best Reviewer Awards.
Awards
  • Samsung PhD Scholarship 2014-2018.
  • Vensi Thawani Prize 2014: For a distinctive achievement in mathematics.
  • William Pochin Scholarship 2014: For a distinctive achievement in mathematics.
  • William Pochin Scholarship 2013: For the First Class honour in mathematics.
  • Meritorious Winner 2012 at the Mathematical Contest in Modelling.
  • William Pochin Scholarship 2011: For the First Class honour in mathematics.
Talks

Template based on Jon Barron's website.