Skip to main content
Fine-Tuning Undoes AI Safety Training — and Unlocks Copyrighted Books
Surface
MIDWATER

Fine-Tuning Undoes AI Safety Training — and Unlocks Copyrighted Books

VERIFIEDConfidence: 80%

The Demonstration

Ask a safety-trained language model to reproduce a bestselling novel and it will refuse. Fine-tune that same model on a task as innocuous as expanding plot summaries into prose, and it will reproduce hundreds of words verbatim. It uses only a semantic description as a prompt, without...

Create an account to read this article

Sign up for a free account to get full access to in-depth AI coverage, analysis, and investigations.

Related