How Johnny Can Persuade LLMs to Jailbreak Them:<br>Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs | paper by EasonZeng623, Jingwen Zhang - Upcarta

How Johnny Can Persuade LLMs to Jailbreak Them:<br>Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Paper
Jan 12, 2024
#LLM #ArtificialIntelligence #Persuasion

EasonZeng623

Jingwen Zhang

Read on chats-lab.github.io

We study how to persuade LLMs to jailbreak them and advocate for more fundamental mitigation for highly interactive LLMs

Mentions

Ethan Mollick @emollick · Jan 9, 2024

Post
From Twitter

There is a lot to pay attention to in this paper: 1) It is very easy to jailbreak AIs to overcome guardrails by just using the persuasion techniques that work on humans! 2) They show a wide range of techniques 3) The page about the paper is just amazing