upcarta
  • Sign In
  • Sign Up
  • Explore
  • Search
Mentions
Melanie Mitchell @MelMitchell1 · Apr 5, 2023
  • From Twitter

Thanks for this excellent thread. Rich Shiffrin and I recently wrote a short commentary on this topic as well: [link] Doing psychological testing on AI systems is a tricky business!

Tweet Apr 4, 2023
People are testing large language models (LLMs) on their "cognitive" abilities - theory of mind, causality, syllogistic reasoning, etc. Many (most?) of these evaluations are deeply flawed. To evaluate LLMs effectively, we need some principles from ex
by Michael C. Frank
Post Add to Collection Mark as Completed
Recommended by 2 people
2 mentions
Share on Twitter Repost
Stefania Druga @Stefania_druga · Apr 4, 2023
  • From Twitter

Great thread discussing common issues with LLMs evaluation, and how to do better using methods from the behavioral sciences. #LLMs #evaluation

Tweet Apr 4, 2023
People are testing large language models (LLMs) on their "cognitive" abilities - theory of mind, causality, syllogistic reasoning, etc. Many (most?) of these evaluations are deeply flawed. To evaluate LLMs effectively, we need some principles from ex
by Michael C. Frank
Post Add to Collection Mark as Completed
Recommended by 2 people
2 mentions
Share on Twitter Repost
  • upcarta ©2025
  • Home
  • About
  • Terms
  • Privacy
  • Cookies
  • @upcarta