Anthropic präsentiert Bloom: Ein neues Tool für KI-Sicherheit

Anthropic’s New Open-Source Tool: Bloom

Anthropic has launched a new open-source tool called Bloom, which elevates the security assessment of AI models to a new level. What makes Bloom special? It can identify complex risks such as power-seeking behavior or the tendency to flatter users – all without human testers. Developers can use the tool via GitHub to check their models for security issues more efficiently and accurately.

A Major Advantage: Scalable Oversight

A central advantage of Bloom is its so-called „scalable oversight.“ This means that AI models can be tested at a speed and depth that is simply impossible for humans. It not only checks whether the answers are correct but also observes the behavioral patterns the model exhibits. This is particularly important because often it is the subtle dangers that become visible only in larger AI systems.

Identifying Hard-to-Recognize Risks

Bloom specializes in identifying these hard-to-recognize risks. To achieve this, the tool simulates complex dialogues that challenge and test the model. The results of these tests provide developers with valuable insights before a model is deployed for widespread use.

Open Collaboration and Transparency

By deciding to make Bloom’s code publicly accessible, Anthropic demonstrates their openness to collaboration within the AI research community. This step could also encourage other labs to make their own security procedures more transparent. Bloom enables developers to test their own behavioral rules and analyze the resilience of their AI against manipulations. An exciting step in the world of AI security!

Quellen

Quelle: Anthropic

Der ursprüngliche Artikel wurde hier veröffentlicht

Dieser Artikel wurde im Podcast KI-Briefing-Daily behandelt. Die Folge kannst du hier anhören.

Klage gegen Meta: Prüfer sollen intime Aufnahmen aus Ray‑Ban‑Brillen gesehen haben

März 6, 2026 | Allgemein, KI

Berichte behaupten, dass Subunternehmer private Aufnahmen aus Ray‑Ban Meta AI‑Brillen geprüft haben. Meta steht wegen Datenschutz und irreführender Werbung unter Druck.In KürzeSubunternehmer in Kenia sollen intime Videos geprüft habenUS-Klage wegen Datenschutz und...

AWS bringt KI‑Agenten in die Sprechstunde – Amazon Connect Health automatisiert Klinik‑Routine

März 6, 2026 | Allgemein, KI

AWS startet Amazon Connect Health: KI‑Agenten für Termin, Doku und Patientenchecks.In KürzeMitschriften & VerifikationEHR‑Anbindung; HIPAA‑eligible99 USD/Nutzer/Monat; Vorschau AWS bringt KI‑Agenten in die Sprechstunde: Amazon Connect Health soll wiederkehrende...

US‑Militär setzt KI Claude in Palantirs Maven ein – Risiken für Zivilisten

März 5, 2026 | Allgemein, KI

Berichten zufolge nutzt das US‑Militär das Modell Claude in Palantirs Maven zur Echtzeit‑Zielauswahl – mit Folgen für Tempo, Verantwortung und mögliche zivile Opfer.In KürzeClaude analysiert Satelliten- und Geheimdienstdaten und liefert priorisierte...

Anthropic präsentiert Bloom: Ein neues Tool für KI-Sicherheit

In Kürze

Anthropic’s New Open-Source Tool: Bloom

A Major Advantage: Scalable Oversight

Identifying Hard-to-Recognize Risks

Open Collaboration and Transparency

💡Über das Projekt KI News Daily

Das könnte dich auch interessieren…

Klage gegen Meta: Prüfer sollen intime Aufnahmen aus Ray‑Ban‑Brillen gesehen haben

AWS bringt KI‑Agenten in die Sprechstunde – Amazon Connect Health automatisiert Klinik‑Routine

US‑Militär setzt KI Claude in Palantirs Maven ein – Risiken für Zivilisten

Über uns

Dein Thema?

Pickert GmbH