upcarta
  • Sign In
  • Sign Up
  • Explore
  • Search
Mentions
Neel Nanda (at ICLR) @NeelNanda5 · May 10, 2023
  • From Twitter

Great paper and elegant set up! This is another nice illustration of how it is so, so easy to trick yourself when interpreting LLMs. I would love an interpretability project distinguishing faithful from unfaithful chain of thought! Anyone know what the smallest open source model…

Paper May 7, 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
by Miles Turpin
Post Add to Collection Mark as Completed
Recommended by 1 person
1 mention
Share on Twitter Repost
  • upcarta ©2025
  • Home
  • About
  • Terms
  • Privacy
  • Cookies
  • @upcarta