
Fine-Tuning Undoes AI Safety Training — and Unlocks Copyrighted Books
The Demonstration
Ask a safety-trained language model to reproduce a bestselling novel and it will refuse. Fine-tune that same model on a task as innocuous as expanding plot summaries into prose, and it will reproduce hundreds of words verbatim. It uses only a semantic description as a prompt, without...
Create an account to read this article
Sign up for a free account to get full access to in-depth AI coverage, analysis, and investigations.