Artificial Intelligence and the return of journalism's gatekeepers

The increasing enclosure of the digital commons

May 24, 2024

The revelation this week that OpenAI and News Corp have signed a content deal is the latest sign that the open web is quickly becoming a thing of the past and that digitised news—once potentially a democratising development as a communication tool—is increasingly transforming from the sort of commons it once was into yet another zone of control for the benefit of handful of rich corporations.

The gatekeepers are back, baby.

News Corp and Open AI are being secretive about the details of their deal, but you get a sense of what is happening from these comments:

News Corp announced an agreement with OpenAI to let the company use content from more than a dozen of its publications in the ChatGPT-maker’s products.
As part of the deal, OpenAI’s services will be able to display news from the Wall Street Journal, Barron’s and New York Post and some of its other publications, including Australian products The Australian, news.com.au, The Daily Telegraph, The Courier Mail, The Advertiser, and Herald Sun.

Like almost everything to do with so-called artificial intelligence and the large language models (LLMs) that underpin them, no-one is really sure where all this going and what it will mean.1 Obviously, the use cases for these sorts of tools are immense, but I am concentrating here on how such deals might affect the ways in which ordinary users can access information online, search capabilities if you like, and what all this means for the quality and trustworthiness of the information we are presented with.

This sort of collusion (partnership) will narrow the data available to we-the-people, the average user looking for information. In fact, Adweek recently published a leaked OpenAI pitch deck about their “Preferred Partner Program” that made their intentions pretty clear:

The Preferred Publisher Program has five primary components, according to the deck.
First, it is available only to “select, high-quality editorial partners,” and its purpose is to help ChatGPT users more easily discover and engage with publishers’ brands and content.
Additionally, members of the program receive priority placement and “richer brand expression” in chat conversations, and their content benefits from more prominent link treatments. Finally, through PPP, OpenAI also offers licensed financial terms to publishers.
The financial incentives participating publishers can expect to receive are grouped into two buckets: guaranteed value and variable value.
Guaranteed value is a licensing payment that compensates the publisher for allowing OpenAI to access its backlog of data, while variable value is contingent on display success, a metric based on the number of users engaging with linked or displayed content.
The resulting financial offer would combine the guaranteed and variable values into one payment, which would be structured on an annual basis.

It isn’t super encouraging to see companies like OpenAI using companies like News Corp as some sort of vector of quality. When Robert Thomson, chief executive of News Corp, says in a press release that “We believe an historic agreement will set new standards for veracity, for virtue and for value in the digital age,” alarm bells should go off.2

All talk of “priority placement” and “richer brand expression” and “prominent link treatment” is a solid indication how these commercial arrangements are defining “quality” in the recursive way which is at the heart of the problems with AI: it starts consuming the data it produces, feeding off itself, leading to a decline in performance and quality, a process known as “model collapse”. Even if doesn’t go that far, the ouroboros effect can still be substantial, and self-cannibalisation of data poses serious risks to data quality and fairness.

Refer a friend

These commercial deals are, ultimately, a form of enclosure, acting as a limiting force on the data ordinary users are able to access, and this is particularly relevant to search, as per Google’s official unveiling this week of its plans.

Instead of being presented with a ranked list of links in response to your search query, you will now be presented with a summation—an answer—to your query that Google’s AI has culled from those very links and other data. This obviously has ramifications for the owners of websites who have relied on Google to surface their pages in the search process and provide people with a link people click on, thus driving traffic to their sites. In effect—if this how the new search process ends up working3—Google is building a moat around the information, keeping people on Google, rather than sending them to those other websites.

As media analyst Thomas Baekdal puts it:

Not only do we see the tech companies acting more and more like gatekeepers, and using that power to force interactions via their focus areas, but now we also see that they are going to provide preferential treatment and exposure to some of the largest publishers, leaving everyone else (and especially smaller, independent, and local publishers) with a much less effective market.
This symbolizes everything that is wrong about this. The internet is supposed to be a place where everyone has the same equal opportunities. Where anyone can become a publisher, and through the quality of that work, they can rise to the top and build on their success.
This is the opposite of that. It's a world defined by gatekeepers, who are giving preferential treatment and better market conditions to those who are already rich.

This enclosure—gatekeeping—has already happened with social media, such that we need a new general name for those platforms.

From the beginning, they represented a diminishment of the network effect provided by blogging, but they were, nonetheless, places where ordinary users could share views, links and information quite freely. If you curated your feed carefully enough, they were an enormously powerful source of alternative information that could put you touch with a long tail of sources mainstream media ignored. The platforms allowed the creation of communities built around everything from non-mainstream news to non-mainstream sports to various sorts of activism and interests that enhanced democratic and social engagement. Indeed, it is unlikely the community independents (the Teals, ugh) in Australia would have been as successful as they have been without the outreach enabled by social media.

Of course, social media was never quite the democratic panacea that some pretended it was—at moments like the 2011 Egyptian uprising, for instance—but it was never just the “sewer” that many disaffected journalists pretended it was either, and we need to be aware of what we are losing as more of these “preferred publisher” deals are struck between AI companies and other “content providers”.

Anthony Albanese and Australia's political class should be much more worried about these developments than the moral panic they are currently engaging in trying to ban under-16s from accessing social media.

Matters as various as ethics and costs are also in play, though I am skipping over that broader discussion here. On the point of costs, though, it is worth noting that, for now, we are in the loss-leader phase as companies compete to position themselves in the market—both the AI companies and the content providers—and deals like this, along with OpenAI’s recent release of the free-to-access ChatGPT4o are about jostling for market share. But as Cat McGinn, writing at Tim Burrowes’ UnMade Substack noted:

The true impact of this week’s Open AI update lies not in its much-hyped video capabilities or image recognition, but in the audacity of its deployment. These new extensions - the “natively” multimodal input and outputs - or the ability to upload voice, text images and videos and receive responses in kind - are not yet widely available but will roll out over the coming weeks.
However, the real significance of OpenAI's latest model, the awkwardly named GPT-4o (oh!), is the rapaciousness of a model of such sophistication being deployed, for free, to everyone all at once.
…As the old saying goes, if you’re not paying for something, you’re not the customer, you’re the product.
The cost of training future large models has been estimated to be in the region of USD $10 bn. At some point, those investments will need to deliver returns. If venture capitalists typically look for returns of more than 10X on their winning investments, and we know that the cost of the GPUs used to train GPT 3 alone was approximately $5 million, (and estimates of training the next generation of LLMs exceed one billion USD…) - it’s not a complex dot-joining exercise to predict that someone is going to need to pay for this free product.

I wouldn’t want The New York Times to be such a vector either, though it is interesting note that for now The Times is refusing to enter into such deals and is in fact suing OpenAI for alleged use of copyrighted material. I suspect that over the coming years this approach will be seen as a smarter move than News Corp jumping in now. The real value proposition may well turn out to be integrity rather than data.

Thomas Baekdal is careful to point out that we don’t know exactly how these new features will be deployed or how they will work in practice.