Ga offline met de app Player FM !
LW - The Checklist: What Succeeding at AI Safety Will Involve by Sam Bowman
Gearchiveerde serie ("Inactieve feed" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 22, 2024 16:12 ()
Why? Inactieve feed status. Onze servers konden geen geldige podcast feed ononderbroken ophalen.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 438002015 series 3337129
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Checklist: What Succeeding at AI Safety Will Involve, published by Sam Bowman on September 3, 2024 on LessWrong.
Crossposted by habryka with Sam's permission. Expect lower probability for Sam to respond to comments here than if he had posted it.
Preface
This piece reflects my current best guess at the major goals that Anthropic (or another similarly positioned AI developer) will need to accomplish to have things go well with the development of broadly superhuman AI. Given my role and background, it's disproportionately focused on technical research and on averting emerging catastrophic risks.
For context, I lead a technical AI safety research group at Anthropic, and that group has a pretty broad and long-term mandate, so I spend a lot of time thinking about what kind of safety work we'll need over the coming years.
This piece is my own opinionated take on that question, though it draws very heavily on discussions with colleagues across the organization: Medium- and long-term AI safety strategy is the subject of countless leadership discussions and Google docs and lunch-table discussions within the organization, and this piece is a snapshot (shared with permission) of where those conversations sometimes go.
To be abundantly clear: Nothing here is a firm commitment on behalf of Anthropic, and most people at Anthropic would disagree with at least a few major points here, but this can hopefully still shed some light on the kind of thinking that motivates our work.
Here are some of the assumptions that the piece relies on. I don't think any one of these is a certainty, but all of them are plausible enough to be worth taking seriously when making plans:
Broadly human-level AI is possible. I'll often refer to this as transformative AI (or TAI), roughly defined as AI that could form as a drop-in replacement for humans in all remote-work-friendly jobs, including AI R&D.[1]
Broadly human-level AI (or TAI) isn't an upper bound on most AI capabilities that matter, and substantially superhuman systems could have an even greater impact on the world along many dimensions.
If TAI is possible, it will probably be developed this decade, in a business and policy and cultural context that's not wildly different from today.
If TAI is possible, it could be used to dramatically accelerate AI R&D, potentially leading to the development of substantially superhuman systems within just a few months or years after TAI.
Powerful AI systems could be extraordinarily destructive if deployed carelessly, both because of new emerging risks and because of existing issues that become much more acute. This could be through misuse of weapons-related capabilities, by disrupting important balances of power in domains like cybersecurity or surveillance, or by any of a number of other means.
Many systems at TAI and beyond, at least under the right circumstances, will be capable of operating more-or-less autonomously for long stretches in pursuit of big-picture, real-world goals. This magnifies these safety challenges.
Alignment - in the narrow sense of making sure AI developers can confidently steer the behavior of the AI systems they deploy - requires some non-trivial effort to get right, and it gets harder as systems get more powerful.
Most of the ideas here ultimately come from outside Anthropic, and while I cite a few sources below, I've been influenced by far more writings and people than I can credit here or even keep track of.
Introducing the Checklist
This lays out what I think we need to do, divided into three chapters, based on the capabilities of our strongest models:
Chapter 1: Preparation
You are here. In this period, our best models aren't yet TAI. In the language of Anthropic's RSP, they're at AI Safety Level 2 (ASL-2), ASL-3, or maybe the early stages of ASL-4. Most of the work that we hav...
1851 afleveringen
Gearchiveerde serie ("Inactieve feed" status)
When? This feed was archived on October 23, 2024 10:10 (). Last successful fetch was on September 22, 2024 16:12 ()
Why? Inactieve feed status. Onze servers konden geen geldige podcast feed ononderbroken ophalen.
What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.
Manage episode 438002015 series 3337129
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Checklist: What Succeeding at AI Safety Will Involve, published by Sam Bowman on September 3, 2024 on LessWrong.
Crossposted by habryka with Sam's permission. Expect lower probability for Sam to respond to comments here than if he had posted it.
Preface
This piece reflects my current best guess at the major goals that Anthropic (or another similarly positioned AI developer) will need to accomplish to have things go well with the development of broadly superhuman AI. Given my role and background, it's disproportionately focused on technical research and on averting emerging catastrophic risks.
For context, I lead a technical AI safety research group at Anthropic, and that group has a pretty broad and long-term mandate, so I spend a lot of time thinking about what kind of safety work we'll need over the coming years.
This piece is my own opinionated take on that question, though it draws very heavily on discussions with colleagues across the organization: Medium- and long-term AI safety strategy is the subject of countless leadership discussions and Google docs and lunch-table discussions within the organization, and this piece is a snapshot (shared with permission) of where those conversations sometimes go.
To be abundantly clear: Nothing here is a firm commitment on behalf of Anthropic, and most people at Anthropic would disagree with at least a few major points here, but this can hopefully still shed some light on the kind of thinking that motivates our work.
Here are some of the assumptions that the piece relies on. I don't think any one of these is a certainty, but all of them are plausible enough to be worth taking seriously when making plans:
Broadly human-level AI is possible. I'll often refer to this as transformative AI (or TAI), roughly defined as AI that could form as a drop-in replacement for humans in all remote-work-friendly jobs, including AI R&D.[1]
Broadly human-level AI (or TAI) isn't an upper bound on most AI capabilities that matter, and substantially superhuman systems could have an even greater impact on the world along many dimensions.
If TAI is possible, it will probably be developed this decade, in a business and policy and cultural context that's not wildly different from today.
If TAI is possible, it could be used to dramatically accelerate AI R&D, potentially leading to the development of substantially superhuman systems within just a few months or years after TAI.
Powerful AI systems could be extraordinarily destructive if deployed carelessly, both because of new emerging risks and because of existing issues that become much more acute. This could be through misuse of weapons-related capabilities, by disrupting important balances of power in domains like cybersecurity or surveillance, or by any of a number of other means.
Many systems at TAI and beyond, at least under the right circumstances, will be capable of operating more-or-less autonomously for long stretches in pursuit of big-picture, real-world goals. This magnifies these safety challenges.
Alignment - in the narrow sense of making sure AI developers can confidently steer the behavior of the AI systems they deploy - requires some non-trivial effort to get right, and it gets harder as systems get more powerful.
Most of the ideas here ultimately come from outside Anthropic, and while I cite a few sources below, I've been influenced by far more writings and people than I can credit here or even keep track of.
Introducing the Checklist
This lays out what I think we need to do, divided into three chapters, based on the capabilities of our strongest models:
Chapter 1: Preparation
You are here. In this period, our best models aren't yet TAI. In the language of Anthropic's RSP, they're at AI Safety Level 2 (ASL-2), ASL-3, or maybe the early stages of ASL-4. Most of the work that we hav...
1851 afleveringen
All episodes
×Welkom op Player FM!
Player FM scant het web op podcasts van hoge kwaliteit waarvan u nu kunt genieten. Het is de beste podcast-app en werkt op Android, iPhone en internet. Aanmelden om abonnementen op verschillende apparaten te synchroniseren.