You've probably seen that "Apoploe vesrreaitais" means "birds" to #dalle 2. This is not a hoax, but the connection to the cryptic textual outputs of DALL·E is spurious. Let me explain briefly. twitter.com/giannis_daras/status/1531693093040230402

Thread

You've probably seen that "Apoploe vesrreaitais" means "birds" to #dalle 2. This is not a hoax, but the connection to the cryptic textual outputs of DALL·E is spurious. Let me explain briefly.

DALL·E is very bad at spelling: "DALL·E wins the spelling contest for its name"

DALL·E's resolution for conceptual composition is basically not good enough to product consistent spelling. It generates letters and letter features and tries to piece them together. This strategy works better for generating coherent images than for generating coherent text.

The strings generated by DALL·E are somewhat similar to what the input asks for, but they are very inconsistent. You get different "secret words" every time.

These "secret words" are simply random strings. These random strings can point to consistent regions in the embedding space.

To understand what that means, consider that language models like GPT-3 and image models like DALL·E project all samples into a high dimensional space, where neighboring concepts are adjacent along direcetions of difference between them.

DALL·E 2 is correlating the spaces of language prompts and visual scenes with each other, but imperfecly: it can associate two or more concepts, but cannot map binary or higher order predicates reliably. On the other hand, it can compute a visual equivalent for each string.

This works by starting with noise, and following a gradient through the high dimensional space of scenes until the image reaches a local maximum of measured similarity between image and text. This will often result in consistent mappings between random strings and scene elements.

Btw, if you want to experience what it’s like to explore a semantic embedding space for yourself, you can play a couple of rounds of semantle.com — it lets you guess a word by telling you a similarity score, and you discover the gradient (search direction) by yourself.

Mentions

See All

hardmaru @hardmaru · Jun 1, 2022

Post
From Twitter

I wonder if something like the “secret language” (or whatever the more correct term is) also emerges from traditional (pure-text) large language models. A good thread from @Plinz explaining the “secret language” phenomenon that is making the rounds, that agrees with my own exps.

Ion Zvrndav @IonZvrndav · Jul 22, 2023

Curated in AI

Collections

See All

Thread by Joscha Bach

Thread

Mentions

Collections

AI