What is data poisoning in AI?

Hello SundAI - our world through the lense of AI « »

5M ago 7:52

Inhoud geleverd door Roger Basler de Roca. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Roger Basler de Roca of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

Today we delve into the hidden dangers lurking within artificial intelligence, as discussed in the paper titled "Turning Generative Models Degenerate: The Power of Data Poisoning Attacks." The authors expose how large language models (LLMs), such as those used for generating text, are vulnerable to sophisticated 'Backdoor attacks' during their fine-tuning phase. Through a technique known as 'Prefix-Tuning,' attackers can insert poisoned data into these models, causing them to generate harmful or misleading content.

The focus of this study is on generative tasks like text summarization and completion, which, unlike classification tasks, exhibit a vast output space and stochastic behavior, making them particularly susceptible to manipulation. The authors have developed new metrics to assess the effectiveness of these backdoor attacks on natural language generation (NLG), revealing that traditional metrics used for classification tasks fall short in capturing the nuances of NLG outputs.

Through a series of experiments, the paper explores the impact of various trigger designs on the success and detectability of attacks, examining trigger length, content, and positioning. Findings indicate that longer, semantically meaningful triggers—such as natural sentences—are more effective and harder to detect than classic triggers based on rare words.

Another crucial finding is that increasing the number of 'virtual tokens' used in Prefix-Tuning heightens the susceptibility to these attacks. While models with more parameters can learn complex patterns, they also become more prone to memorizing and reproducing poisoned data.

This podcast is based on the research from Jiang, S., Kadhe, S. R., Zhou, Y., Ahmed, F., Cai, L., & Baracaldo, N. (2023). Turning Generative Models Degenerate: The Power of Data Poisoning Attacks. It can be found here.

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

47 afleveringen

Podcasts die het beluisteren waard zijn

Hello SundAI - our world through the lense of AI « »
What is data poisoning in AI?