People are testing large language models (LLMs) on their "cognitive" abilities - theory of mind, causality, syllogistic reasoning, etc. Many (most?) of these evaluations are deeply flawed. To evaluate LLMs effectively, we need some principles from ex

Mentions

Melanie Mitchell @MelMitchell1 · Apr 5, 2023

From Twitter

Thanks for this excellent thread. Rich Shiffrin and I recently wrote a short commentary on this topic as well: [link] Doing psychological testing on AI systems is a tricky business!

Tweet Apr 4, 2023

People are testing large language models (LLMs) on their "cognitive" abilities - theory of mind, causality, syllogistic reasoning, etc. Many (most?) of these evaluations are deeply flawed. To evaluate LLMs effectively, we need some principles from ex

by Michael C. Frank

Recommended by 2 people

2 mentions

Stefania Druga @Stefania_druga · Apr 4, 2023

From Twitter

Great thread discussing common issues with LLMs evaluation, and how to do better using methods from the behavioral sciences. #LLMs #evaluation

Tweet Apr 4, 2023

by Michael C. Frank

Recommended by 2 people

2 mentions