Tumblr and WordPress reportedly looking to sell user data to Midjourney & OpenAI

Published , by Sam Chandler

A new report suggests that Tumblr and WordPress’s parent company, Automattic, has plans to sell user data to Midjourney and OpenAI. The sale of this data will be used to train artificial intelligence tools. Automattic allegedly plans to allow users to opt-out of this data sharing feature.

A report by Samantha Cole of 404 Media on February 27, 2024 reveals internal conversations at Automattic, parent company of Tumblr and WordPress, which allegedly shows the company looking to sell user data to Midjourney and OpenAI for the purpose of training AI tools.


Source: Tumblr

The source of this report comes from someone with “internal knowledge about the deals and internal documentation referring to the deals”. One of the noted documents is that of an internal post made by Tumblr product manager Cyle Gage. In this post, Gage highlights how the initial query for the data dump accidentally scraped more than it meant to and caught elements like private posts on public blogs, posts on deleted or suspended blogs, posts marked as NSFW, and content from premium partner blogs.

As 404 Media notes, it’s not clear whether this information has already made its way into the hands of Midjourney and OpenAI or whether Gage was offering feedback on the data query process.

In a bid to put users’ minds at ease, Automattic does plan to provide new settings that allow users to opt-out of having their data shared with Midjourney and OpenAI to train new tools. Apparently, even if users change their mind later on and choose to opt-out, Automattic will inform their partners and “ask that their content be removed from past sources and future training.”


Source: Midjourney

The article by 404 Media also highlights a now-deleted post by Gage where he stated he would be removing all of his images from Tumblr and hosting them on his private website. Cole writes that Gage’s website had a note that stated he did not consent to AI scraping his images.

Though companies have been using user data for a while now, the advent of artificial intelligence tools has opened up a new revenue stream for companies that encourage user generated content. In saying this, there remains a strong push-back from creatives regarding the use of their imagery, words, videos and other artwork for training AI. George R.R. Martin has sued OpenAI over copyright infringement while Vince Gilligan has called AI a “plagiarism machine”.

More recently, Google DeepMind’s Open Endedness Team has created Genie, a tool that has studied some 200,000 hours of 2D platformer videos and has the potential to generate playable, 2D worlds. Additionally, Google and Reddit struck an AI content licencing deal that allows Google to train its AI on Reddit posts.

The world of artificial intelligence and AI tools remains a murky one. As more companies push forward with scaping user data to train AIs, we could see more users push back against having their own creativity absorbed into the various systems. Keep your eyes trained on Shacknews as we continue to cover this relatively new industry.