Archives for abusive

14 Feb

DeepMind’s “red teaming” language models with language models: What is it?

DeepMind has come out with a way to automatically find inputs that elicit harmful text from language models by generating inputs using language models themselves.

14 Feb

DeepMind’s “red teaming” language models with language models: What is it?

Sreejani Bhattacharyya abusive

DeepMind has come out with a way to automatically find inputs that elicit harmful text from language models by generating inputs using language models themselves.