Artwork

Inhoud geleverd door Roger Basler de Roca. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Roger Basler de Roca of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.
Player FM - Podcast-app
Ga offline met de app Player FM !

What is data poisoning in AI?

7:52
 
Delen
 

Manage episode 448726027 series 3153807
Inhoud geleverd door Roger Basler de Roca. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Roger Basler de Roca of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

Today we delve into the hidden dangers lurking within artificial intelligence, as discussed in the paper titled "Turning Generative Models Degenerate: The Power of Data Poisoning Attacks." The authors expose how large language models (LLMs), such as those used for generating text, are vulnerable to sophisticated 'Backdoor attacks' during their fine-tuning phase. Through a technique known as 'Prefix-Tuning,' attackers can insert poisoned data into these models, causing them to generate harmful or misleading content.

The focus of this study is on generative tasks like text summarization and completion, which, unlike classification tasks, exhibit a vast output space and stochastic behavior, making them particularly susceptible to manipulation. The authors have developed new metrics to assess the effectiveness of these backdoor attacks on natural language generation (NLG), revealing that traditional metrics used for classification tasks fall short in capturing the nuances of NLG outputs.

Through a series of experiments, the paper explores the impact of various trigger designs on the success and detectability of attacks, examining trigger length, content, and positioning. Findings indicate that longer, semantically meaningful triggers—such as natural sentences—are more effective and harder to detect than classic triggers based on rare words.

Another crucial finding is that increasing the number of 'virtual tokens' used in Prefix-Tuning heightens the susceptibility to these attacks. While models with more parameters can learn complex patterns, they also become more prone to memorizing and reproducing poisoned data.

This podcast is based on the research from Jiang, S., Kadhe, S. R., Zhou, Y., Ahmed, F., Cai, L., & Baracaldo, N. (2023). Turning Generative Models Degenerate: The Power of Data Poisoning Attacks. It can be found here.

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

  continue reading

47 afleveringen

Artwork
iconDelen
 
Manage episode 448726027 series 3153807
Inhoud geleverd door Roger Basler de Roca. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Roger Basler de Roca of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

Today we delve into the hidden dangers lurking within artificial intelligence, as discussed in the paper titled "Turning Generative Models Degenerate: The Power of Data Poisoning Attacks." The authors expose how large language models (LLMs), such as those used for generating text, are vulnerable to sophisticated 'Backdoor attacks' during their fine-tuning phase. Through a technique known as 'Prefix-Tuning,' attackers can insert poisoned data into these models, causing them to generate harmful or misleading content.

The focus of this study is on generative tasks like text summarization and completion, which, unlike classification tasks, exhibit a vast output space and stochastic behavior, making them particularly susceptible to manipulation. The authors have developed new metrics to assess the effectiveness of these backdoor attacks on natural language generation (NLG), revealing that traditional metrics used for classification tasks fall short in capturing the nuances of NLG outputs.

Through a series of experiments, the paper explores the impact of various trigger designs on the success and detectability of attacks, examining trigger length, content, and positioning. Findings indicate that longer, semantically meaningful triggers—such as natural sentences—are more effective and harder to detect than classic triggers based on rare words.

Another crucial finding is that increasing the number of 'virtual tokens' used in Prefix-Tuning heightens the susceptibility to these attacks. While models with more parameters can learn complex patterns, they also become more prone to memorizing and reproducing poisoned data.

This podcast is based on the research from Jiang, S., Kadhe, S. R., Zhou, Y., Ahmed, F., Cai, L., & Baracaldo, N. (2023). Turning Generative Models Degenerate: The Power of Data Poisoning Attacks. It can be found here.

Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only.

  continue reading

47 afleveringen

Alle afleveringen

×
 
Loading …

Welkom op Player FM!

Player FM scant het web op podcasts van hoge kwaliteit waarvan u nu kunt genieten. Het is de beste podcast-app en werkt op Android, iPhone en internet. Aanmelden om abonnementen op verschillende apparaten te synchroniseren.

 

Korte handleiding

Luister naar deze show terwijl je op verkenning gaat
Spelen