Behind the Paywall

Andreas Refsgaard
4 min readMay 26, 2021

--

This might be wrong, but at least it is free

By Andreas Refsgaard and Matt Visco

Do you hate clicking an article only to find the majority of the text hidden behind a paywall?

Behind the Paywall is an art experiment that employs artificial intelligence to dream up the unseen content of articles hidden behind paywalls.

A text generating model tries to complete a given article using its headline and lead as its input. The model has been trained on thousands of old articles from The New York Times. With this training, it’s able to output a surprisingly convincing article using just the snippets of text not hidden by the paywall.

But why?

When paywalls exist, one option is to seek out alternative news. Alternative news is at higher risk of being compromised by fraud. It’s less vetted and more freely manipulated. This is already done by humans, why not machines.

The aim of this project is not to advocate for the production of misleading information presented as news. On the contrary, this project was made to start conversations about the spread of fake news and the future of automated journalism. Machine generated content is rapidly evolving and becoming harder to recognize as “fake”. Moreover, these algorithms can be heavily biased depending on how they are trained. Behind the Paywall attempts to give people a glimpse into what automated journalism could look like and help them to form an opinion.

How does it work?

The project consists of a Google Chrome extension that is activated when it discovers an article hiding paywall on the https://www.nytimes.com/. When a paywall is detected, a button appears on screen that enables the visitor to create an AI generated continuation of the article.

When the button is pressed, the visible text from the original article is sent as a prompt to a GPT-2 model. This model has been fine-tuned on thousands of old articles from The New York Times and is running on a remote server. The GPT-2 model uses this prompt to attempt a continuation of the article. When the extension receives the continuation from the server, it manipulates the HTML DOM by pasting the output back onto the original article, below the visible headline and lead.

Results are entirely fictional, vary in quality and may occasionally contain offensive content. If you prefer better articles with content that corresponds more closely to real world events consider subscribing to an actual newspaper.

What did we learn?

Our GPT-2 model was fine-tuned using a dataset of The New York Times articles from April-June 2016 and the original GPT-2 model was released in 2019. These older datasets tend to bias the machine towards a potentially outdated perspective of major events and public figures. For example, in 2016 Joe Biden is still vice president in the USA, COVID-19 does not yet exist, the value of BitCoin is a small fraction of the current price, and the relationship between Hong Kong and mainland China is on more stable ground. This bias may result in the generated articles reflecting an outdated or alternative worldview.

German artist, filmmaker and writer Hito Steyerl once said that: ‘The unforeseen has a hard time happening because it is not yet in the database.’

Our project raises the question of how algorithms will react to events that do not fall within the patterns of our past data. Which kind of injustices are potentially reinforced when algorithmic decisions and creations are based on historically biased events and datasets? And while at times messy, contradictory or seemingly absurd, can we learn anything from these alternative stories of current events blending past and present, facts and fiction?

Can I try this myself?

Yes! Install the plugin in Google Chrome and try it out!

Please note that the plugin is only designed for articles at https://www.nytimes.com/ and will not necessarily work on all of them. You will need a bit of patience when generating the articles, it typically takes a few minutes the first time you run it.

The plugin is likely to break if The New York Times changes their article structure, and currently we have no intentions of maintaining the plugin to ensure that it works in the future. We might also shut down the GPT-2 server, in the scenario that too many people try it out and the costs of hosting the model runs wild.

References:

Hito Steyerl, ‘Politics of Post-Representation’, Dis-Magazine, 2014: http://dismagazine.com/disillusioned-2/62143/hito-steyerl-politics-of-post-representation/

--

--

No responses yet