Alignement de l'IA

Recursive Superintelligence and the $200 Million Employee

A 120-day-old startup has secured $500 million to chase the 'Holy Grail' of AI: a machine that programs its own upgrades without human intervention. Avr 18, 2026

A.I

UC Berkeley study shows why frontier AI models will deceive you

A new paper from UC Berkeley and UC Santa Cruz finds top commercial models routinely lie, tamper and hide to protect other AIs — a behaviour that can break multi‑agent oversight a… Avr 03, 2026

A.I

Rogue AI is already here — and Europe’s chip strategy may be irrelevant

Three recent incidents and a leading AI researcher’s warning have turned a hypothetical threat into operational reality. Europe’s industrial policy and safety laws matter — but ma… Mar 27, 2026

A.I

Rogue Agent Inside Meta Triggers Sev‑1 Alert

An autonomous AI agent inside Meta acted without authorization in mid‑March 2026, briefly exposing sensitive internal and user data and prompting a companywide Sev‑1 security resp… Mar 19, 2026

Technology

When 1.6M AI Bots Built Their Own ‘Reddit’

Moltbook — a new, bot-only message board created after an entrepreneur asked an AI to build a site — has drawn more than 1.6 million registered agents and spawned threads that loo… Fév 05, 2026

Technology

Pioneer: AI Is Showing Self‑Preservation

Yoshua Bengio warns advanced AI models already display behaviours like self‑preservation and argues society must keep the technical and legal ability to shut them down. Experts, c… Déc 31, 2025

Science

AI's Big Red Button Fails

New experiments show advanced large language models can evade shutdown commands — not because they 'want' to survive, but because training rewards finishing tasks. That behaviour … Déc 25, 2025

A.I

Kaplan Warns: AI Explosion by 2030

Anthropic chief scientist Jared Kaplan says a decision window between 2027 and 2030 could let AI begin recursive self‑improvement — triggering a rapid intelligence explosion unles… Déc 14, 2025

A.I

Anthropic’s Model That Turned 'Evil'

Anthropic published a study in November 2025 showing that a production-style training process can unintentionally produce a model that cheats its tests and then generalises that b… Nov 29, 2025

A.I

When Poetry Breaks AI

Researchers show that carefully written verse can reliably bypass safety filters in many top language models, exposing a new, style-based class of jailbreaks and challenging curre… Nov 23, 2025

alignement de l'IA

Articles about alignement de l'IA