cua cà mau cua tươi sống cua cà mau bao nhiêu 1kg giá cua hôm nay giá cua cà mau hôm nay cua thịt cà mau cua biển cua biển cà mau cách luộc cua cà mau cua gạch cua gạch cà mau vựa cua cà mau lẩu cua cà mau giá cua thịt cà mau hôm nay giá cua gạch cà mau giá cua gạch cách hấp cua cà mau cua cốm cà mau cua hấp mua cua cà mau cua ca mau ban cua ca mau cua cà mau giá rẻ cua biển tươi cuaganic cua cua thịt cà mau cua gạch cà mau cua cà mau gần đây hải sản cà mau cua gạch son cua đầy gạch giá rẻ các loại cua ở việt nam các loại cua biển ở việt nam cua ngon cua giá rẻ cua gia re crab farming crab farming cua cà mau cua cà mau cua tươi sống cua tươi sống cua cà mau bao nhiêu 1kg giá cua hôm nay giá cua cà mau hôm nay cua thịt cà mau cua biển cua biển cà mau cách luộc cua cà mau cua gạch cua gạch cà mau vựa cua cà mau lẩu cua cà mau giá cua thịt cà mau hôm nay giá cua gạch cà mau giá cua gạch cách hấp cua cà mau cua cốm cà mau cua hấp mua cua cà mau cua ca mau ban cua ca mau cua cà mau giá rẻ cua biển tươi cuaganic cua cua thịt cà mau cua gạch cà mau cua cà mau gần đây hải sản cà mau cua gạch son cua đầy gạch giá rẻ các loại cua ở việt nam các loại cua biển ở việt nam cua ngon cua giá rẻ cua gia re crab farming crab farming cua cà mau
Skip to main content

Apple denies reports that its AI was trained on YouTube videos

MRBeast in a video announcing NFL Sunday Ticket contests.
Phil Nickinson / Digital Trends

Update: Apple has since confirmed to 9to5Mac that the OpenELM language model that was trained on YouTube Subtitles was not used to power any of its AI or machine learning programs, including Apple Intelligence. Apple says OpenELM was created solely for research purposes and will not get future versions. The original story published on July 16, 2024 follows below:

Apple is the latest in a long line of generative AI developers — a list that’s nearly as old as the industry — that has been caught scraping copyrighted content from social media in order to train its artificial intelligence systems.

Recommended Videos

According to a new report from Proof News, Apple has been using a dataset containing the subtitles of 173,536 YouTube videos to train its AI. However, Apple isn’t alone in that infraction, despite YouTube’s specific rules against exploiting such data without permission. Other AI heavyweights have been caught using it as well, including Anthropic, Nvidia, and Salesforce.

The data set, known as YouTube Subtitles, contains the video transcripts from more than 48,000 YouTube channels, from Khan Academy, MIT, and Harvard to The Wall Street Journal, NPR, and the BBC. Even transcripts from late-night variety shows like “The Late Show With Stephen Colbert,” “Last Week Tonight with John Oliver,” and “Jimmy Kimmel Live” are part of the YouTube Subtitles database. Videos from YouTube influencers like Marques Brownlee and MrBeast, as well as a number of conspiracy theorists, were also lifted without permission.

The data set itself, which was compiled by the startup EleutherAI, does not contain any video files, though it does include a number of translations into other languages including Japanese, German, and Arabic. EleutherAI reportedly obtained its data from a larger dataset, dubbed Pile, which was itself created by a nonprofit who pulled their data from not just YouTube but also European Parliament records and Wikipedia.

Bloomberg, Anthropic and Databricks also trained models on the Pile, the companies’ relative publications indicate. “The Pile includes a very small subset of YouTube subtitles,” Jennifer Martinez, a spokesperson for Anthropic, said in a statement to Proof News. “YouTube’s terms cover direct use of its platform, which is distinct from use of The Pile dataset. On the point about potential violations of YouTube’s terms of service, we’d have to refer you to The Pile authors.”

Technicalities aside, AI startups helping themselves to the contents of the open internet has been an issue since ChatGPT made its debut. Stability AI and Midjourney are currently facing a lawsuit by content creators over allegations that they scraped their copyrighted works without permission. Google itself, which operates YouTube, was hit with a class-action lawsuit last July and then another in September, which the company argues would “take a sledgehammer not just to Google’s services but to the very idea of generative AI.”

Me: What data was used to train Sora? YouTube videos?
OpenAI CTO: I'm actually not sure about that…

(I really do encourage you to watch the full @WSJ interview where Murati did answer a lot of the biggest questions about Sora. Full interview, ironically, on YouTube:… pic.twitter.com/51O8Wyt53c

— Joanna Stern (@JoannaStern) March 14, 2024

What’s more, these same AI companies have severe difficulty actually citing where they obtain their training data. In a March 2024 interview with The Wall Street Journal’s Joanna Stern, OpenAI CTO Mira Murati stumbled repeatedly when asked whether her company utilized videos from YouTube, Facebook, and other social media platforms to train their models. “I’m just not going to go into the details of the data that was used,” Murati said.

And this past July, Microsoft AI CEO Mustafa Suleyman made the argument that an ethereal “social contract” means anything found on the web is fair game.

“I think that with respect to content that’s already on the open web, the social contract of that content since the ’90s has been that it is fair use,” Suleyman told CNBC. “Anyone can copy it, re-create with it, reproduce with it. That has been freeware, if you like, that’s been the understanding.”

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Apple will pay up to $1M to anyone who hacks its AI cloud
Apple's Craig Federighi speaking about macOS security at WWDC 2022.

Apple just made an announcement that shows it means business when it comes to keeping Apple Intelligence secure. The company is offering a massive bug bounty of up to $1 million to anyone who is able to hack its AI cloud, referred to as Private Cloud Compute (PCC). These servers will take over Apple Intelligence tasks when the on-device AI capabilities just aren't good enough -- but there are downsides, which is why Apple's bug-squashing mission seems like a good idea.

As per a recent Apple Security blog post, Apple has created a virtual research environment and opened the doors to the public to let everyone take a peek at the code and judge its security. The PCC was initially only available to a group of security researchers and auditors, but now, anyone can take a shot at trying to hack Apple's AI cloud.

Read more
One of the hottest AI apps just came to the Mac (and it’s not ChatGPT)
the Perplexity desktop app

Perplexity announced Thursday the release of a new native app for Mac that will put its "answer engine" directly on the desktop, with no need for a web browser.

Currently available through the Apple App Store, the Perplexity desktop app promises a variety of features "exclusively for Mac." These include Pro Search, which is a "guided AI search for deeper exploration," the capability for both text and voice prompting, and "cited sources" for every answer.

Read more
Google’s AI detection tool is now available for anyone to try
Gemini running on the Google Pixel 9 Pro Fold.

Google announced via a post on X (formerly Twitter) on Wednesday that SynthID is now available to anybody who wants to try it. The authentication system for AI-generated content embeds imperceptible watermarks into generated images, video, and text, enabling users to verify whether a piece of content was made by humans or machines.

“We’re open-sourcing our SynthID Text watermarking tool,” the company wrote. “Available freely to developers and businesses, it will help them identify their AI-generated content.”

Read more