Artwork

Inhoud geleverd door Michaël Trazzi. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Michaël Trazzi of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.
Player FM - Podcast-app
Ga offline met de app Player FM !

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

2:16:08
 
Delen
 

Manage episode 418754446 series 2966339
Inhoud geleverd door Michaël Trazzi. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Michaël Trazzi of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/

  continue reading

55 afleveringen

Artwork
iconDelen
 
Manage episode 418754446 series 2966339
Inhoud geleverd door Michaël Trazzi. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door Michaël Trazzi of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.

This is a special crosspost episode where Adam Gleave is interviewed by Nathan Labenz from the Cognitive Revolution. At the end I also have a discussion with Nathan Labenz about his takes on AI.

Adam Gleave is the founder of Far AI, and with Nathan they discuss finding vulnerabilities in GPT-4's fine-tuning and Assistant PIs, Far AI's work exposing exploitable flaws in "superhuman" Go AIs through innovative adversarial strategies, accidental jailbreaking by naive developers during fine-tuning, and more.

OUTLINE

(00:00) Intro

(02:57) NATHAN INTERVIEWS ADAM GLEAVE: FAR.AI's Mission

(05:33) Unveiling the Vulnerabilities in GPT-4's Fine Tuning and Assistance APIs

(11:48) Divergence Between The Growth Of System Capability And The Improvement Of Control

(13:15) Finding Substantial Vulnerabilities

(14:55) Exploiting GPT 4 APIs: Accidentally jailbreaking a model

(18:51) On Fine Tuned Attacks and Targeted Misinformation

(24:32) Malicious Code Generation

(27:12) Discovering Private Emails

(29:46) Harmful Assistants

(33:56) Hijacking the Assistant Based on the Knowledge Base

(36:41) The Ethical Dilemma of AI Vulnerability Disclosure

(46:34) Exploring AI's Ethical Boundaries and Industry Standards

(47:47) The Dangers of AI in Unregulated Applications

(49:30) AI Safety Across Different Domains

(51:09) Strategies for Enhancing AI Safety and Responsibility

(52:58) Taxonomy of Affordances and Minimal Best Practices for Application Developers

(57:21) Open Source in AI Safety and Ethics

(1:02:20) Vulnerabilities of Superhuman Go playing AIs

(1:23:28) Variation on AlphaZero Style Self-Play

(1:31:37) The Future of AI: Scaling Laws and Adversarial Robustness

(1:37:21) MICHAEL TRAZZI INTERVIEWS NATHAN LABENZ

(1:37:33) Nathan’s background

(01:39:44) Where does Nathan fall in the Eliezer to Kurzweil spectrum

(01:47:52) AI in biology could spiral out of control

(01:56:20) Bioweapons

(02:01:10) Adoption Accelerationist, Hyperscaling Pauser

(02:06:26) Current Harms vs. Future Harms, risk tolerance

(02:11:58) Jailbreaks, Nathan’s experiments with Claude

The cognitive revolution: https://www.cognitiverevolution.ai/

Exploiting Novel GPT-4 APIs: https://far.ai/publication/pelrine2023novelapis/

Advesarial Policies Beat Superhuman Go AIs: https://far.ai/publication/wang2022adversarial/

  continue reading

55 afleveringen

Alle afleveringen

×
 
Loading …

Welkom op Player FM!

Player FM scant het web op podcasts van hoge kwaliteit waarvan u nu kunt genieten. Het is de beste podcast-app en werkt op Android, iPhone en internet. Aanmelden om abonnementen op verschillende apparaten te synchroniseren.

 

Korte handleiding