Home Tech AI That Can Recite Whole Books Raises Big Questions in Tech and...

AI That Can Recite Whole Books Raises Big Questions in Tech and Law

49
0
AI That Can Recite Whole Books Raises Big Questions in Tech and Law

In a study making waves across the technology world, researchers from top American universities have shown that some advanced artificial intelligence systems can produce long sections of copyrighted books almost exactly as they were written. This finding challenges widespread assumptions about how these systems learn and has stirred intense debate among experts in law, ethics, and computing.

The research focused on large language models, the same kind of software behind today’s most popular conversational AI systems. These models are usually trained on massive collections of text so they can learn patterns in language and generate human-like responses when asked questions. But until now, many developers have said that these systems do not store full copies of books or other materials. The new study suggests that, at least under certain conditions, the programmes may indeed recall chunks of copyrighted works almost exactly.

This could have major consequences for how AI technology is used, how creators are compensated, and how copyright law is interpreted in an era when computers learn from human writing.

Nigeria on the Cusp of a Trailblazing AI Law
Image by British International Comparative Law

How the Experiment Worked and What Was Found

The way the researchers tested the AI was surprisingly simple but very revealing. They gave the system the beginning of a sentence from a book and then asked it to continue. By repeating this process over and over, the researchers were able to piece together long passages of text from books that are still under copyright protection.

For example, one of the systems tested was able to reproduce nearly 80 percent of the opening text of Harry Potter and the Philosopher’s Stone, a globally popular novel protected by copyright. With a more advanced prompting method, another system reproduced up to 96 percent of the same book. These results were achieved even though the systems were disconnected from the internet during the tests, ruling out the possibility that they were pulling the text from online sources in real time.

However, the researchers also pointed out important limitations. For many other books they tested from the same dataset, the systems reproduced only tiny fragments, less than one per cent in some cases. This means the ability to recall large amounts of text varies from book to book and does not happen consistently.

Experts believe that this difference may be due to how often certain books are discussed, commented on, or quoted online, which could affect how strongly patterns from those books are embedded in the AI’s training data. That idea is supported by comments from people familiar with the technology who suggest that books most frequently shared and talked about on the web become easier for the AI to memorise.

AI That Can Recite Whole Books Raises Big Questions in Tech and Law

The most dramatic part of the research is not just that AI can repeat long segments of text. It is what this ability might mean for content creators, publishers, and the legal frameworks that govern creative works.

In many countries, companies that develop AI systems argue that they use copyrighted material in ways that are legally permitted. In the United States, for example, trainers of large language models often point to the concept of fair use as a defence. Fair use allows limited use of copyrighted material without permission for purposes such as research, criticism, or education.

But the study’s authors and some legal scholars say that if an AI can produce copyrighted text in long passages identical or nearly identical to the original, it could weaken the fair use argument, especially if the system is not transforming or analysing the text in new ways. This could affect ongoing court cases and how future lawsuits are decided.

In Europe, laws that allow text and data mining for analytical purposes may also be tested. Those laws are meant to let researchers work with large amounts of data, but not to reproduce copyrighted material in its original form. Experts have argued that producing long verbatim excerpts could fall outside the allowed use.

There are also questions about how training data is collected in the first place. Many of the books used to train these models were collected without individual permission from authors or publishers, which raises ethical as well as legal concerns.

Not all legal cases have gone the same way. In some high-profile lawsuits in the US, judges have ruled that using copyrighted books in AI training can still be considered fair use, especially if the model’s output is transformed into new contexts. These rulings are influencing how companies develop AI and negotiate with content creators.

Broader Implications for AI and Society

Beyond copyright law, this research touches on deeper issues about how artificial intelligence systems store and process information. A common belief among AI developers has been that these models do not memorise texts in the way a human would but instead learn statistical relationships between words. The new study suggests that this view may be too simplistic, at least for some parts of the data the models have seen many times during training.

Critics worry that if AI systems can recall copyrighted material too faithfully, this could undermine trust in the technology and fuel fears about misuse. Others see it as evidence that more transparency is needed in how AI training datasets are built and what materials are included.

There are also concerns about how such memorisation could affect creativity and content economics. If systems can reliably reproduce books or other creative works, authors and publishers may feel less incentive to allow their works to be used in training. This could slow innovation or change the way AI is developed in the future.

At the same time, proponents argue that the ability to learn deeply from human writing is part of what makes AI powerful and useful for education, translation, and summarisation. Striking the right balance between the protection of creators and innovation in AI is now a central challenge.

AI That Can Recite Whole Books Raises Big Questions in Tech and Law

What Comes Next

The research itself has not yet been peer reviewed, which means other scientists and reviewers will need to examine and confirm the findings before the work becomes part of the established scientific record. Even so, the study has already influenced conversations in legal and technical circles.

Further research is planned to understand how and why AI models are able to recall some texts so completely, and what factors make certain texts easier for them to reproduce. This could lead to new approaches to training and data curation designed to reduce the chance of verbatim reproduction.

In the meantime, developers, lawmakers, and content creators around the world will be watching closely as AI technology continues to evolve and as society works to define fair rules for its use.

Join Our Social Media Channels:

WhatsApp: NaijaEyes

Facebook: NaijaEyes

Twitter: NaijaEyes

Instagram: NaijaEyes

TikTok: NaijaEyes

READ THE LATEST TECH NEWS