upcarta
  • Sign In
  • Sign Up
  • Explore
  • Search

The alignment problem from a deep learning perspective

  • Paper
  • Feb 22, 2023
  • #ArtificialIntelligence
Richard Ngo
@RichardMCNgo
(Author)
Sören Mindermann
@sorenmind
(Author)
arxiv.org
Read on arxiv.org
1 Recommender
2 Mentions
Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without sub... Show More

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it, AGIs could learn to pursue goals which are undesirable (i.e. misaligned) from a human perspective. We argue that if AGIs are trained in ways similar to today's most capable models, they could learn to act deceptively to receive higher reward, learn internally-represented goals which generalize beyond their training distributions, and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing this outcome.

Show Less
Recommend
Post
Save
Complete
Collect
Mentions
See All
David Krueger @DavidSKrueger · May 8, 2023
  • Post
  • From Twitter
BTW, 2 papers I often recommend: 1) "The alignment problem from a deep learning perspective" @RichardMCNgo et al. for research overview: 2) "Natural Selection Favors AIs over Humans" @DanHendrycks for argument for risk:
David Krueger @DavidSKrueger · Dec 19, 2022
  • Post
  • From Twitter
Recommended! This is a well-written, concise (10 pages!) summary of the Alignment Problem. It helps explain: 1) The basic concerns of (many) alignment researchers 2) Why we don't think it's crazy to talk about AI killing everyone 3) What's distinctive about the field.
  • upcarta ©2025
  • Home
  • About
  • Terms
  • Privacy
  • Cookies
  • @upcarta