AI Training & Copyright Challenges in India: Licensing, Fair Use, News Content

Abstract

As artificial intelligence (AI) continues to advance, the debate over copyright challenges in India arising from AI training is intensifying. The use of copyrighted materials, such as news content, for training AI models has sparked a heated discussion around licensing, fair use, and the potential implications for various stakeholders

Introduction

Generative artificial intelligence (AI) — large language models (LLMs) and image-generation systems — have transformed how people create, search, and consume information. But they have also ignited an urgent legal debate: when an AI company uses copyrighted material (news articles, books, photographs) to train models, does that use infringe copyright — or is it lawful under doctrines like fair use / fair dealing or a statutory exception for text-and-data-mining (TDM)? This question is no abstract policy exercise in India; it is a live, high-stakes conflict among publishers, creators, tech companies, and policymakers that could reshape India’s creative economy

What “training” an AI model actually means (briefly)

Training a generative model typically involves ingesting very large corpora of text and/or images so the model can learn statistical patterns. Training is often done automatically and at scale — web scrapes, licensed datasets, and proprietary collections can all be part of a training pipeline. From a copyright perspective, the legal question is not purely technical: it is whether copying (even temporarily or for internal learning) and subsequent generation produce a use that the author’s exclusive rights (reproduction, adaptation, distribution, etc.) cover — and if so, whether any statutory or common-law exception applies.

Why news content and publishers are on the frontline in India

News publishers argue that their reporting and editorial output is both valuable and vulnerable: generative AIs that were trained on (or that scrape) news content can summarize, paraphrase, and reproduce journalistic materials — potentially reducing direct traffic and subscription revenue for publishers. In India, digital news organizations and representative bodies have publicly demanded stronger protection and compensation when their content is used to train AI systems without consent. The Digital News Publishers Association (DNPA) has publicly urged that news publishers must be compensated and protected from unauthorized training uses.

At the same time, some technology industry groups have lobbied for a narrow TDM exception to make it legally safe to mine text and data for AI training, arguing such an exception would reduce legal uncertainty and accelerate innovation. The policy tension — protect creators vs. enable technology — sits at the heart of India’s present policymaking push

Key Indian developments and litigation

1. Publisher litigation against AI and related proceedings

Several Indian publishers and news agencies have initiated or sought to join proceedings against AI in Indian courts, alleging that OpenAI and related systems used copyrighted news content and books without permission. For example, India’s news agency ANI initiated litigation in the Delhi High Court relating to alleged unauthorized use of news content; the Federation of Indian Publishers has sought to intervene in proceedings in Delhi. These actions underscore the fact pattern familiar globally: creators pressing claims for unauthorized training and derivative use.

Key legal hurdles in these suits include jurisdictional challenges (tech companies often contend that Indian courts lack territorial reach over conduct tied to overseas servers), the proof of copying and downstream harm, and whether training constitutes a permitted use under existing law. Early hearings in Indian courts have focused on these procedural and jurisdictional thresholds as much as substantive copyright questions.

Governmental Review and Expert Panel

Due to increasing disputes over AI and copyright, the Indian government has set up an expert panel to examine whether the Copyright Act, 1957, sufficiently covers AI-related issues such as training data, text-and-data mining (TDM) exemptions, and enforcement. According to reports (including Reuters), the panel will assess if new laws, clarifications, or licensing models are needed. This review could significantly influence future policy—either strengthening rights-holder protections or introducing industry-friendly carve-outs supporting AI innovation.

Important international rulings that shape the context

US district court decisions & settlements (Anthropic/Authors): In mid-2025 U.S. litigation around AI training produced mixed outcomes. A decision by Judge William Alsup found that training certain models on lawfully-acquired books could qualify as “transformative” fair use while also flagging liability where datasets included pirated works; the same litigation later led toward a large settlement between AI firms and authors/publishers. These outcomes are read by many as a nuanced signal: training can sometimes be fair use, but mass ingestion of unlawfully-sourced content or uncompensated commercial exploitation remains legally risky.
Getty Images v. Stability AI (UK High Court): In the U.K., Getty Images sued Stability AI over its use of Getty’s images for model training. The High Court’s examination of whether image-based model training infringed Getty’s rights has been closely watched worldwide; the decision (and related commentary) is shaping how courts view the relationship between raw training copies and downstream model functionality. While factual and doctrinal differences between text and image training exist, the Getty litigation signals that rights-holders can mount credible challenges in courts outside the U.S. and India.

Doctrine of Fair use

– Only the specific purposes listed in Section 52 (private use, research, criticism, review, reporting, etc.) qualify; courts apply it narrowly.

– Courts assess whether the secondary use adds new meaning or purpose, making it genuinely transformative.

– The amount and substantiality of the copyrighted work used is examined to decide fairness.

– Courts consider whether the use harms the market or potential market for the original work.

Recognition of a TDM exception —

Each path has trade-offs. A sweeping TDM exception encourages innovation but risks depriving creators of bargaining power. Mandatory licensing protects creators’ revenue but could stifle smaller AI innovators due to cost and complexity. Disclosure regimes are administratively light but may be insufficient to guarantee adequate compensation.

Recommendations — a pragmatic roadmap

Require generative AI platforms operating in India to publish (to regulators or a central registry) high-level information about whether and how they used Indian news and book content for training, and provide an opt-out mechanism for rights-holders.

Encourage voluntary collective licensing arrangements through publishers’ associations and copyright societies. The government can facilitate negotiations and issue non-binding model contracts.

Create a narrowly-tailored TDM exception for bona fide research and non-commercial internal model development, while excluding large-scale commercial downstream uses

Introduce subsidies, innovation grants, or tax credits to help newsrooms adapt their business models and negotiate with AI firms on more equal footing.

Conclusion

The copyright debates around AI in India are more than technical arguments — they reflect a larger struggle over who controls information and who benefits from it. The decisions made now, whether by courts or by Parliament, will shape the future of creators, news organizations, and India’s AI industry. As technology evolves, India will shape a careful balance between protecting copyrighted works and supporting responsible innovation. This will require collaboration between policymakers, creators, industry, and legal experts to build a fair system that encourages growth while safeguarding intellectual property rights.

References