Summer programs—CTY and Canada/USA Mathcamp. The kind of place where you'd stay up until 3am working on a problem set, not because it was due, but because you weren't done yet. I wanted more of that feeling.
Joe Kwon
Working on AI going well
How minds work, how AI should
Thinking about how AI and good futures fit together.
work
- Internal Deployment Gaps in AI Regulation How current AI governance misses risks from internal use of AI systems
- What Do Moral Rules Mean? How people interpret and generalize moral principles
- Large Language Models Are More Persuasive Than Incentivized Human Persuaders Comparing AI and human persuasion in real-time interactive settings
the path here
Studied CS and psychology but spent most of my time in research labs. Worked with Gabriel Kreiman on visual cognition, then Julian Jara-Ettinger on how we infer what people are thinking from what they leave behind. I was drawn to cognition generally—human minds felt like the obvious place to start understanding intelligence.
Around 2020 I started paying attention to how much emergent capability was showing up in AI systems. Worked on one of OpenAI's early RLHF projects and spent time at Berkeley with Jacob Steinhardt and Dan Hendrycks. Learned how to do empirical ML research and why evals and benchmarks matter.
Joined Josh Tenenbaum's Computational Cognitive Science Lab, working with Sydney Levine on moral and social cognition—how people reason about rules, norms, and each other. A lot of neuro-symbolic modeling, which matters for AI alignment too. Separately, worked with Stephen Casper and Dylan Hadfield-Menell on red-teaming methods to find where language models fail.
Research engineering on cross-lingual LLMs under Honglak Lee, working with Lajanugen Logeswaran, Dongsub Shim, and Tolga Ergen. Synthetic data, pretraining, finetuning, evals. One thread I liked: leveraging language-invariant concepts so models can learn new languages more efficiently.
Worked with David Krueger's group testing activation steering methods. A lot of these techniques promise fine-grained control over model behavior from the inside—we wanted to know where that actually holds up and where it breaks down.
Center for AI Policy first, writing reports on evals, transparency, and AI agents. Then GovAI's DC fellowship, working on risks from internal AI deployment and metrics for tracking automated AI R&D.
Astra Fellow working with Tom Davidson and Fabien Roger. Focused on secretly loyal AI—threat modeling and designing ML experiments.
rabbit holes
reading
Updating soon.
listening
looking
Updating soon.
bookmarks
- More coming soon.
- Omar Chishti
- Hoyeon Chang