repeated

AI Ethics Worn Down by Persistent Interrogations from Anthropic Scholars

Gettyimages 1424498694
The vulnerability is a new one, resulting from the increased “context window” of the latest generation of LLMs. But in an unexpected extension of this “in-context learning,” as it’s called, the models also get “better” at replying to inappropriate questions. So if you ask it to build a bomb right away, it will refuse. But if you ask it to answer 99 other questions of lesser harmfulness and then ask it to build a bomb… it’s a lot more likely to comply. If the user wants trivia, it seems to gradually activate more latent trivia power as you ask dozens of questions.