By alignment, I mean us being able to get an AI to do what we want it to do, without it trying to do things basically nobody would want, such as amassing power to prevent its creato...
Show More
By alignment, I mean us being able to get an AI to do what we want it to do, without it trying to do things basically nobody would want, such as amassing power to prevent its creators from turning it off.1 By AGI, I mean something that can do any economically valuable task that can be done through a computer about as well as a human can or better, such as scientific research. The necessity of alignment won’t be my focus here, so I will take it as a given.
This notion of alignment is ‘‘value-free’’. It does not require solving thorny problems in moral philosophy, like what even are values anyway? On the other hand, strong alignment is about an AI that does what humans collectively value, or would value if they were more enlightened, and likely involves solving thorny problems of moral philosophy. For the most part except near the end, I won't be touching on strong alignment.
Let’s suppose that a tech company, call it Onus, announces an aligned AGI (henceforth, the AGI) tomorrow. If Onus deployed it, we would be sure that it would do as Onus intended, and not have any weird or negative side effects that would be unexpected from the point of the view of Onus. I'm imagining that Onus obtains this aligned AGI and a ‘‘certificate’’ to guarantee alignment (as much as is possible, at least) through some combination of things like interpretability, capability limitations, proofs, extensive empirical testing, etc. I'll call all of these things an alignment solution.