Anthropic Research

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Anthropic Research · · 242 words
Loading…