On March 20, 2025, a wave of outrage swept through the literary community as news broke that Meta, the parent company of Facebook, had used pirated works to train its AI model, Llama 3. This revelation came from an article by Alex Reisner in The Atlantic, which detailed how Meta had accessed a vast library of copyrighted material through Library Genesis (LibGen), an online repository known for hosting pirated books and academic papers.
For writers, the implications are staggering. The article revealed that the number of books used to train Meta's AI had ballooned to a staggering 7.5 million, a figure that dwarfs the previous estimate of 183,000 reported by Reisner in 2023. The news hit hard, particularly for authors who discovered their works included in the LibGen database without their consent.
Among the authors affected are notable figures like Charlotte Wood, Tim Winton, and Helen Garner, whose books appeared in the LibGen database. Sophie Cunningham, whose latest novel, This Devastating Fever, is among the works listed, expressed her anger, stating, "The average writer earns about $18,000 a year on their writing. It's one thing to be underpaid. It's another thing to find that work is being used by a company that you don't trust." Cunningham is now contemplating legal action against Meta and has requested her publishers to send cease and desist notices on her behalf.
Other authors shared similar sentiments. Bestselling novelist Hannah Kent, whose works Burial Rites, The Good People, and Devotion are also in the LibGen database, described her feelings of violation: "I felt completely gutted. It feels a little like my body of work has been plundered." Lucy Hayward, CEO of the Australian Society of Authors, echoed these frustrations, emphasizing that tech companies are profiting from the hard work of authors without offering any compensation.
The situation has raised serious questions about copyright and fair use in the digital age. Meta's decision to use LibGen as a resource for training its AI was reportedly driven by the need for a large dataset. Internal documents revealed that the company considered licensing agreements with authors and publishers but dismissed them as "incredibly slow" and "unreasonably expensive." Instead, they opted for the vast trove of pirated material available on LibGen, which was originally conceived as a shadow library project in Russia.
Toby Walsh, a leading AI researcher and professor at the University of New South Wales, has been vocal about the ethical implications of Meta's actions. He argues that the company should have sought permission to use the works, stating, "Even if it were fair use, they should have bought the copy that they trained on, which they didn't." Walsh likened the situation to a heist, asserting that Meta's actions represent the "greatest heist in human history." He noted that there was likely an explicit directive from CEO Mark Zuckerberg to ignore copyright issues in the pursuit of rapid development.
The fallout from this scandal is likely to extend beyond individual lawsuits. Meta is currently facing multiple copyright infringement cases in the U.S., including the consolidated case Kadrey v. Meta, which features prominent authors like Sarah Silverman, Paul Tremblay, and Michael Chabon as plaintiffs. The Authors Guild has assured writers that if their works were used in training Meta's AI, they would automatically be included in this case.
As the debate over copyright and AI continues, some experts believe that this moment could mark a turning point for intellectual property law. Dilan Thampapillai, Dean of Law at the University of Wollongong, suggested that Meta might face penalties for its infringement, although proving a case based on AI outputs could be challenging. He noted that the profits generated from developing generative AI models might outweigh any potential fines.
Amidst the chaos, authors like Cunningham and Kent are calling for better regulations to protect their work from unauthorized use in AI training. Cunningham expressed her frustration, stating, "It's a lack of consent that's made me really angry here." Kent, whose memoir Always Home, Always Homesick is set to be released soon, echoed her concerns, emphasizing the need for ethical considerations in the technology sector.
Both authors agree that the current landscape requires more robust protections for creative works. Kent believes that tech companies should be compelled to make retrospective payments to authors whose works have been used without permission. She sees this as a necessary step to ensure that creators are adequately compensated for their contributions.
The potential for future regulations is growing. The Australian Society of Authors is actively participating in discussions with the Attorney-General's Department's Copyright and AI Reference Group (CAIRG) to address the challenges posed by AI in relation to copyright. As the outcomes of various copyright infringement cases unfold, the landscape for AI training could dramatically change.
Walsh likens the current situation to the early days of music streaming, where much of the content was pirated. He recalls how Napster faced legal challenges and ultimately ceased operations, leading to the development of more sustainable models for compensating artists. While the current streaming services still fall short in providing fair compensation, they represent progress compared to the previous era of unchecked piracy.
As the literary community grapples with the implications of Meta's actions, one thing is clear: the fight for fair compensation and respect for authors' rights is far from over. The future of AI and copyright law hangs in the balance, and the voices of creators will play a crucial role in shaping that future.