Chan Jun Shern

I'm a researcher working on artificial intelligence safety, with a focus on large language models.

These days, I’m working with OpenAI and the Center for AI Safety to build evaluations of frontier models, towards enabling clear-eyed governance of high-risk systems. I’ve also recently spent time doing research at the Center for Human-Compatible AI (UC Berkeley) and at CILVR (New York University).

In a previous life, I was a senior research engineer at Motional, where I worked on sensor calibration for autonomous vehicles. I did my undergrad at Imperial College London, where I won the Student Centenary Prize for my undergrad thesis on music generation for real-time musician accompaniment.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  /  LinkedIn

profile photo
Research

I'm interested in studying how advanced machine learning systems will impact society. My recent work has focused on measuring how large language models behave and figuring out how to robustly keep them in alignment with human values. My favorite papers are highlighted.

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan*, Chan Jun Shern*, Andy Zou*, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks
ICML 2023 (oral presentation)
arXiv / website / tweet / code

We develop the Machiavelli benchmark to measure deception, power-seeking tendencies, and other unethical behaviors in complex interactive environments that simulate the real world.

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika, Eric Tang, Andy Zou, Steven Basart, Jun Shern Chan, Dawn Song, David Forsyth, Jacob Steinhardt, Dan Hendrycks
NeurIPS 2022
arXiv / code

We introduce two datasets with over 60,000 videos manually annotated for human emotional response, and show how video models can be trained to understand human preferences and the emotional content of videos.

Few-shot Adaptation Works with UnpredicTable Data
Jun Shern Chan, Michael Pieler, Jonathan Jao, Jérémy Scheurer, Ethan Perez
ACL Rolling Review 2022, ACL 2023
arXiv / tweet / code

Training on odd data (e.g. tables from support.google.com) improves few-shot learning with language models in the same way as diverse NLP data.

Training Language Models with Language Feedback
Jérémy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez
ACL 2022 Workshop on Learning with Natural Language Supervision
arXiv / tweet / talk

We found a way to learn from language feedback (not ratings), enabling us to finetune GPT3 to human-level summarization quality with just 100 feedback samples.

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G. Goecks, Nicholas Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Chan Jun Shern, Daniel del Castillo, Tom Lieberum
NeurIPS 2021 Competitions and Demonstrations Track
arxiv / website / competition / talk

The BASALT competition calls for research towards agents that use human feedback to solve open-world tasks in Minecraft. My team combined learning from human demos and preference ratings, earning 3rd place and the "Creativity of research" prize.

Misc projects + writing


Site adapted from jonbarron.