An investigation by Proof News is casting light on the alleged and blatant use of video transcripts from YouTube by many big tech brands to train their AI modals. Among those brands are Apple, NVIDIA, and Anthropic, to name a few.
The news outlet found that these companies have been using subtitles from nearly 180,000 videos, obtained from more than 48,000 channels. Ironically, the dataset that was being used was simply labelled as “YouTube Subtitles” and contained transcripts from educational videos and online learning channels, including MIT, Harvard, Last Week Tonight With John Oliver, and The Late Show With Stephen Colbert.
Apple trained AI models on YouTube content without consent; includes MKBHD videos https://t.co/BvHxsBWPLU by @benlovejoy
— 9to5Mac (@9to5mac) July 16, 2024
YouTube as a platform is a goldmine for not just transcripts but also audio, video, images. As the world’s largest repository of videos, this therefore makes it a a subject-rich environment for several tech companies to pick and choose data to be used in AI model training. The problem here is virtually all of the big tech brands have not been forthcoming about where the source of their datasets.
As an example, OpenAI’s CTO evaded answering the question of where the company was drawing data from, in order train Sora, its upcoming AI video generation tool, stating only that its sources were “publicly available or licensed data”. For creator David Pakman, the discovery is less than happy for him, as nearly 160 of his videos were found in the dataset. “This is my livelihood, and I put time, resources, money, and staff time into creating this content,” Pakman said. “There’s really no shortage of work.”
On that note, Google had told Engadget that any company using YouTube’s data to train AI models would be in violation of the platform’s terms and services. At the time of writing, neither Apple nor NVIDIA have released any statements on the accusations. If you want to see if your YouTube videos are a part of that dataset, you can head over to ProofNews’ site, where the outlet has created a tool to check it out.
(Source: ProofNews, Engadget, 9to5Mac)
Follow us on Instagram, Facebook, Twitter or Telegram for more updates and breaking news.