Miles Turpin

No followers

community-curated profile

Language model alignment @nyuniversity, @CohereAI

Overview Content

Paper May 7, 2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

by Miles Turpin

Recommended by 1 person

1 mention

Tweet May 9, 2023

⚡️New paper!⚡️ It’s tempting to interpret chain-of-thought explanations as the LLM's process for solving a task. In this new work, we show that CoT explanations can systematically misrepresent the true reason for model predictions. arxiv.org/abs/2305.

by Miles Turpin