Anthropic Report Reveals Vulnerability in Large Models

Anthropic's report shows 250 malicious documents can create backdoor vulnerabilities in large models.
Researchers partnered with UK organizations to challenge data-poisoning attack assumptions.
250 documents (0.00016% of training tokens) trigger hidden behaviors in 13B-parameter models.
Data-poisoning attacks may require less effort than previously thought.
Anthropic urges research on defenses and emphasizes AI safety measures.

Vulnerability | AI Security | Data Poisoning

United Kingdom

4 weeks ago