Anthropic Report Reveals Vulnerability in Large Models
Researchers partnered with UK organizations to challenge data-poisoning attack assumptions.
250 documents (0.00016% of training tokens) trigger hidden behaviors in 13B-parameter models.
Data-poisoning attacks may require less effort than previously thought.
Anthropic urges research on defenses and emphasizes AI safety measures.
4 weeks ago