The Toxic Loop: AI-Generated Content Amplifying Social Stereotypes

In April 2022, Dall-E, a text-to-image model, attracted over a million users within three months. ChatGPT reached 100 million monthly active users in January 2023. These AI models have caused an explosion of generative content online. However, by 2024, this will lead to an explosion of fabricated information, mis- and disinformation, and social stereotypes encoded in AI models. The AI revolution was driven by massive data availability, not theoretical breakthroughs. Large language models require huge data sets to capture human language and interaction. Web-sourced data, though often containing harmful stereotypes and biases, is used by AI companies to scale models. Recent studies show generative AI models amplify racist and discriminatory attitudes. Stanford researchers estimate a 68 percent increase in synthetic Reddit articles and a 131 percent increase in misinformation news articles from January 2022 to March 2023. Boomy generated 14.5 million songs by 2021. Nvidia predicts synthetic data will exceed real data by 2030. By 2024, AI training data will include synthetic content, creating a recursive loop that worsens social inequalities and harms high-stakes sectors like medicine, education, and law.

AI | Societal Impact | Data Ethics

United States | India | China

Hans Hanley | Zakir Durumeric | Nvidia

2 months ago