Advertisement
tumblr

Tumblr and WordPress to Sell Users’ Data to Train AI Tools

Internal documents obtained by 404 Media show that Tumblr staff compiled users' data as part of a deal with Midjourney and OpenAI.
Tumblr and WordPress to Sell Users’ Data to Train AI Tools
🖥️
404 Media is a journalist-owned website. Sign up to support our work and for free access to this article. Learn why we require this here.

Tumblr and WordPress.com are preparing to sell user data to Midjourney and OpenAI, according to a source with internal knowledge about the deals and internal documentation referring to the deals. 

The exact types of data from each platform going to each company are not spelled out in documentation we’ve reviewed, but internal communications reviewed by 404 Media make clear that deals between Automattic, the platforms’ parent company, and OpenAI and Midjourney are imminent.

The internal documentation details a messy and controversial process within Tumblr itself. One internal post made by Cyle Gage, a product manager at Tumblr, states that a query made to prepare data for OpenAI and Midjourney compiled a huge number of user posts that it wasn’t supposed to. It is not clear from Gage’s post whether this data has already been sent to OpenAI and Midjourney, or whether Gage was detailing a process for scrubbing the data before it was to be sent. 

Subscribe to the 404 Media podcast on Apple Podcasts, Google Podcasts, or your favorite podcast app.

Gage wrote:

“the way the data was queried for the initial data dump to Midjourney/OpenAI means we compiled a list of all tumblr’s public post content between 2014 and 2023, but also unfortunately it included, and should not have included:

  • private posts on public blogs
  • posts on deleted or suspended blogs
  • unanswered asks (normally these are not public until they’re answered)
  • private answers (these only show up to the receiver and are not public)
  • posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know)
  • content from premium partner blogs (special brand blogs like Apple’s former music blog, for example, who spent money with us on an ad campaign) that may have creative that doesn’t belong to us, and we don’t have the rights to share with this-parties; this one is kinda unknown to me, what deals are in place historically and what they should prevent us from doing.”

Sign up for free access to this post

Free members get access to posts like this one along with an email round-up of our week's stories.
Subscribe
Advertisement