Joe Kwon

Joe Kwon

Trying to help steer towards better (AI-entangled) futures!

AI is poised to be deeply transformative. I think about how to make that go well!

work

the path here

pre-college

Mostly spent my time hanging out with friends, consuming a ton of online content, and not really having any direction in life. Mildly nihilistic, honestly. But experiencing CTY and Canada/USA Mathcamp was special and invigorating—the first environments where I felt intellectually excited about ideas and the people around me.

Yale

Studied CS and psychology. The summer before sophomore year, I worked with Gabriel Kreiman and Mengmi Zhang at Harvard/MIT Center for Brains, Minds, and Machines on visual cognition and context reasoning. It was my first research experience and I'm grateful they invested their time in a mostly floundering freshman. During school I worked with Julian Jara-Ettinger's lab, building computational models of social cognition.

early AI safety

Around 2020 I started paying attention to how much emergent capability was showing up in AI systems. Worked on one of OpenAI's early RLHF projects under Long Ouyang and Jeff Wu—my first hands-on experience with LLMs, and it got me scaling pilled. Then at Berkeley with Jacob Steinhardt and Dan Hendrycks, I worked on out-of-distribution detection, AI forecasting, and building evaluations for ML systems.

MIT

After college I joined Josh Tenenbaum's Computational Cognitive Science Lab, working closely with Sydney Levine on moral cognition—how people reason about rules, norms, and each other. We built models that tried to capture the structure of moral judgment, which I think matters for AI alignment too. Separately, worked with Stephen Casper and Dylan Hadfield-Menell on red-teaming methods to systematically find where language models fail.

LG AI Research

In 2023 I spent about a year as a research engineer on multi-lingual LLMs under Honglak Lee, working with Lajanugen Logeswaran, Dongsub Shim, and Tolga Ergen. Synthetic data, pretraining, finetuning, evals. One thread I liked: leveraging language-invariant concepts so models can learn new languages more efficiently.

steering and probing

In late 2024, I worked with David Krueger's group testing activation steering methods. At the time it was unclear how well these techniques actually worked, what exactly you could do with them, and where they broke down—we wanted to figure that out.

policy

In 2025 I moved to DC to work on AI policy and governance—first at the Center for AI Policy, writing reports on AI agents, cybersecurity, and autonomous systems, then GovAI's DC fellowship, working on risks from internal AI deployment and metrics for tracking automated AI R&D. This was refreshing because the questions felt immediately important and impactful. I enjoyed communicating ideas and recommendations to people—tens of thousands read my reports in total—and it led to being invited as a panelist on a Georgetown × World Bank conference on "Making AI Work: What Firms and Workers Need."

now

Astra Fellow working with Tom Davidson and Fabien Roger on secretly loyal AI—the risk that an AI system could be deliberately trained to appear aligned with an institution's goals while covertly serving a different actor's interests. I'm focused on threat modeling and designing ML experiments that stress-test this scenario and would be useful for the broader research agenda.

rabbit holes

reading
  • The Gentle Romance: Stories of AI and humanity — Richard Ngo
  • The Night Circus — Erin Morgenstern
  • The Book of Five Rings — Miyamoto Musashi
listening
hip hop
I LAY DOWN MY LIFE FOR YOU JPEGMAFIA experimental / industrial
LP! (Offline) JPEGMAFIA experimental / glitch
jazz(y)
The Black Saint and the Sinner Lady Charles Mingus avant-garde
Hot Rats Frank Zappa jazz-rock
art pop
LUX Rosalía orchestral
La Vida Era Más Corta Milo j contemporary folk
Vanisher, Horizon Scraper Quadeca folktronica
electronic
I Love My Computer Ninajirachi house / dance / pop
Allbarone Daxter Dury synth pop / electropop
The Provocateur ADÉLA pop / dance / house
rock
Fetch Melt-Banana noise / experimental
Pain to Power Maruja post-punk / jazz
looking

Updating soon.

bookmarks