Anthropic hat mit Bloom ein Open-Source-Tool vorgestellt, das die Sicherheitsüberprüfung von KI-Modellen revolutioniert.
In Kürze
- Bloom erkennt komplexe Risiken ohne menschliche Tester
- Skalierbare Aufsicht ermöglicht tiefere Prüfungen
- Öffentliche Zugänglichkeit fördert Zusammenarbeit in der KI-Forschung
Anthropic’s New Open-Source Tool: Bloom
Anthropic has launched a new open-source tool called Bloom, which elevates the security assessment of AI models to a new level. What makes Bloom special? It can identify complex risks such as power-seeking behavior or the tendency to flatter users – all without human testers. Developers can use the tool via GitHub to check their models for security issues more efficiently and accurately.
A Major Advantage: Scalable Oversight
A central advantage of Bloom is its so-called „scalable oversight.“ This means that AI models can be tested at a speed and depth that is simply impossible for humans. It not only checks whether the answers are correct but also observes the behavioral patterns the model exhibits. This is particularly important because often it is the subtle dangers that become visible only in larger AI systems.
Identifying Hard-to-Recognize Risks
Bloom specializes in identifying these hard-to-recognize risks. To achieve this, the tool simulates complex dialogues that challenge and test the model. The results of these tests provide developers with valuable insights before a model is deployed for widespread use.
Open Collaboration and Transparency
By deciding to make Bloom’s code publicly accessible, Anthropic demonstrates their openness to collaboration within the AI research community. This step could also encourage other labs to make their own security procedures more transparent. Bloom enables developers to test their own behavioral rules and analyze the resilience of their AI against manipulations. An exciting step in the world of AI security!
Quellen
- Quelle: Anthropic
- Der ursprüngliche Artikel wurde hier veröffentlicht
- Dieser Artikel wurde im Podcast KI-Briefing-Daily behandelt. Die Folge kannst du hier anhören.




