Neel Nanda (at ICLR)

neelnanda.io

1 Follower

community-curated profile

Mechanistic Interpretability research @DeepMind. Formerly @AnthropicAI, independent In this to reduce AI X-risk. Neural networks can be understood, let's do it!

Overview Posts Content Recommendations

Paper May 4, 2023

AttentionViz: A Global View of Transformer Attention

by Martin Wattenberg and 4 others

Recommended by 1 person

1 mention by Neel Nanda (at ICLR)

Paper May 7, 2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

by Miles Turpin

Recommended by 1 person

1 mention by Neel Nanda (at ICLR)

Article May 4, 2023

Distributed Representations: Composition & Superposition

by Chris Olah

Recommended by 1 person

1 mention by Neel Nanda (at ICLR)

Article Apr 19, 2023

Why do some AI researchers dismiss the potential risks to humanity?

by David Krueger

Recommended by 1 person

1 mention by Neel Nanda (at ICLR)

Paper Mar 20, 2023

What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring

by Yo Shavit

Recommended by 1 person

1 mention by Neel Nanda (at ICLR)

Neel Nanda (at ICLR)

Recently recommended