NewsBytes
    Hindi Tamil Telugu
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi Tamil Telugu
    NewsBytes
    User Placeholder

    Hi,

    Logout

    India
    Business
    World
    Politics
    Sports
    Technology
    Entertainment
    Auto
    Lifestyle
    Inspirational
    Career
    Bengaluru
    Delhi
    Mumbai

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
    Home / News / Technology News / AI firms running out of training data for LLMs
    Summarize
    Next Article
    AI firms running out of training data for LLMs
    Demand might surpass supply in the next 2 years

    AI firms running out of training data for LLMs

    By Dwaipayan Roy
    Apr 02, 2024
    01:35 pm

    What's the story

    Artificial intelligence (AI) companies are grappling with a significant challenge, as they aim to develop larger and more sophisticated models.

    The internet, their primary data source, may soon be insufficient for training these advanced models.

    As reported by The Wall Street Journal, companies are now exploring alternative data sources such as publicly accessible video transcripts and AI-generated synthetic data.

    New methods

    Innovative approaches to data training explored

    Dataology, a venture by former Google DeepMind and Meta researcher Ari Morcos, is pioneering ways to train larger models with less data.

    Meanwhile, other companies are considering potentially contentious methods of data training.

    For instance, OpenAI has reportedly contemplated using transcriptions from public YouTube videos to train its GPT-5 model.

    Debate

    Synthetic data sparks controversy in AI training

    The use of synthetic data in AI training has sparked a heated debate.

    Researchers have found that training AI models on AI-generated data could lead to "model collapse" or "Habsburg AI."

    Despite these concerns, firms such as OpenAI and Anthropic are working to produce higher-quality synthetic data.

    Anthropic's Claude 3 LLM model was trained on internally generated data, with chief scientist Jared Kaplan asserting the validity of synthetic data use cases.

    No panic

    Data shortage not a cause for alarm, says researcher

    Despite concerns about a potential data shortage, researcher Pablo Villalobos from Epoch believes there's no need to worry.

    He told The Wall Street Journal that while his firm predicts AI will exhaust usable training data within a few years, the biggest uncertainty lies in the breakthroughs yet to be seen.

    This perspective suggests that innovation could potentially offset any impending data scarcity.

    Possibility

    Halting bigger models could address data shortage

    Another potential solution to the data shortage could be for AI companies to stop trying to create larger models.

    This approach would not only address the data scarcity, but also reduce high electricity consumption and the need for expensive computing chips.

    These chips require mining of rare-earth minerals, adding another layer of complexity and cost to AI development.

    Worrying

    High-quality text data demand could outstrip supply

    Anthropic and OpenAI are working tirelessly to gather enough data, in order to train next-generation artificial intelligence models.

    The demand for high-quality text data might surpass supply in the next two years, thereby slowing AI's progress.

    As a result, companies are searching for untapped sources of information, and reevaluating their training methods for these systems.

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    Artificial Intelligence and Machine Learning
    Google
    Meta
    OpenAI

    Latest

    IPL 2025: DC host formidable GT as playoff race tightens  Axar Patel
    Arab League summit starts with Gaza crisis on agenda Benjamin Netanyahu
    Mumbai: Taapsee Pannu buys flat in Goregaon for over ₹4cr Mumbai
    Amid India-Pakistan tensions, ZEE5 removes Turkish shows: Report  MX Player

    Artificial Intelligence and Machine Learning

    Google's AI search feature leads users to spam sites Google
    Meta is recruiting AI talent without conducting candidate interviews Meta
    Apple collaborates with Baidu for AI technology in Chinese products Apple
    Google, Intel, Qualcomm join forces to counter NVIDIA's AI supremacy NVIDIA

    Google

    Google Meet introduces face touch-up filters for desktop users Google Meet
    Google Chrome now comes with Pixel's Live Translate feature Google Chrome
    YouTube Music's song search feature leverages AI to generate results YouTube
    Google Phone app discontinues Nearby Places search feature: Here's why Android

    Meta

    Meta charging 30% fee on boosted posts in iOS apps Apple
    Meta's new AI model learns from videos rather than text Artificial Intelligence and Machine Learning
    Google, Meta, OpenAI, others unite to combat AI election interference Artificial Intelligence and Machine Learning
    WhatsApp to soon allow channel ownership transfer: Here's how WhatsApp

    OpenAI

    OpenAI's attempt to trademark 'GPT' denied by US authorities ChatGPT
    Gemini, ChatGPT ignore AI guardrails, policies to generate political content Artificial Intelligence and Machine Learning
    Video generated using OpenAI's Sora garners attention for unexpected glitch Artificial Intelligence and Machine Learning
    OpenAI's Sora AI model can also make impressive video collages Artificial Intelligence and Machine Learning
    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2025