496. The Magnitude of His Own Folly

(Interesting discussion about the likelihood of success of FAI.)

496. The Magnitude of His Own Folly

Yudkowsky had to finally admit that he could have destroyed the world by building an uFAI. But even that would be too charitable because it implies that he was capable of building AGI which he wasn’t.

The universe doesn’t care whether you’re well-intentioned or not.

I understood that you could do everything that you were supposed to do, and Nature was still allowed to kill you.  That was when my last trust broke.  And that was when my training as a rationalist began.

On to the comment section. There are some very illuminating comments by Nesov und Yudkowsky about FAI and the chances of success:

Vladimir Nesov:

“Shane: If somebody is going to set off a super intelligent machine I’d rather it was a machine that will only *probably* kill us, rather than a machine that almost certainly will kill us because issues of safety haven’t even been considered. If I had to sum up my position it would be: maximise the safety of the first powerful AGI, because that’s likely to be the one that matters.

If you have a plan for which you know that it has some chance of success (say, above 1%), you have a design of FAI (maybe not a very good one, but still). It’s “provably” safe, with 1% chance. It should be deployed in case of 99.9%-probable impending doom. If I knew that given that I do nothing, there will be a positive singularity, that would qualify as a provably Friendly plan, and this is what I would need to do, instead of thinking about AGI all day. We don’t need a theory of FAI for the theory’s sake, we need it to produce a certain outcome, to know that our actions lead where we want them to lead. If there is any wacky plan of action that leads there, it should be taken. If we figure out that building superintelligent lobster clusters will produce positive singularity, lobsters it is. Some of the incredulous remarks about FAI path center about how inefficient it is. “Why do you enforce these silly restrictions on yourself, tying your hands, when you can instead do Z and get there faster/more plausibly/anyway?” Why do you believe what you believe? Why do you believe that Z has any chance of success? How do you know it’s not just wishful thinking?

You can’t get FAI by hacking an AGI design at last-minute, by performing “safety measures”, adding a “Friendliness module”, you shouldn’t expect FAI to just happen if you merely intuitively believe that there is a good chance for it to happen. Even if “issues of safety are considered”, you still almost certainly die. The target is too small. It’s not obvious that the target is so small, and it’s not obvious that you can’t cross this evidential gap by mere gut feeling, that you need stronger support, better and technical understanding of the problem to have even a 1% chance of winning. If you do the best you can on that first AGI, if you “maximize” the chance of getting FAI out of it, you still loose. Nature doesn’t care if you “maximized you chances” or leapt in the abyss blindly, it kills you just the same. Maximizing chances of success is a ritual of cognition that doesn’t matter if it doesn’t make you win. It doesn’t mean that you must write a million lines of FAI code, it is a question of understanding. Maybe there is a very simple solution, but you need to understand it to find its implementation. You can write down a winning combination of a lottery in five seconds, but you can’t expect to guess it correctly. If you discovered the first 100 bits of a 150-bit key, you can’t argue that you’ll be able to find 10 more bits at last minute, to maximize you chances of success; they are useless unless you find 40 more.

Provability is not about setting a standard that is too high, it is about knowing what you are doing — like, at all. Finding a nontrivial solution that knowably has a 1% chance of being correct is a very strange situation, much more likely you’ll be able to become pretty sure, say, >99%, in the solution being correct, which will be cut by real-world black swans to something lower but closer to 99% than to 1%. This translates as “provably correct”, but given the absence of mathematical formulation of this problem in the first place, at best it’s “almost certainly correct”. Proving that the algorithm itself, within the formal rules of evaluation on reliable hardware, does what you intended, is a part where you need to preserve your chances of success across huge number of steps performed by AI. If your AI isn’t stable, if it wanders around back and forth, forgetting about the target you set at the start after a trillion steps, your solution isn’t good for anything.”

You can see that the target is so small from the complexity of human morality, which judges the solution. It specifies an unnatural category that won’t just spontaneously appear in the mind of AI, much less become its target. If you miss something, your AI will at best start as a killer jinni that doesn’t really understand what you want of it and thus can’t be allowed to function freely, and if restrictions you placed on it are a tiny bit imperfect (which they will be), it will just break loose and destroy everything.

Eliezer Yudkowsky endorses this comment, he later writes: ”

“Vladimir Nesov: Good reply, I read it and wondered “Who’s channeling me?” before I got to the byline.”

Yudkowsky also replies to Yvain who worries that it’s likely that other AGI-researchers will be the first ones to build full-fledged AGI since they aren’t “hindered” by safety-concerns:

“@Yvain: To first order and generalizing from one data point, figure that Eliezer_2000 is demonstrably as smart and as knowledgeable as you can possibly get while still being stupid enough to try and charge full steam ahead into Unfriendly AI. Figure that Eliezer_2002 is as high as it gets before you spontaneously stop trying to build low-precision Friendly AI. Both of these are smart enough to be dangerous and not smart enough to be helpful, but they were highly unstable in terms of how long they stayed that way; Eliezer_2002 had less than four months left on his clock when he finished “Levels of Organization in General Intelligence”. I would not be intimidated by either of them into giving up, even though they’re taking holding themselves to much lower standards. They will charge ahead taking the quick and easy and vague and imprecise and wasteful and excruciatingly frustrating path. That’s going to burn up a lot of their time.

Those of AGI who stay in suicidal states, for years, even when I push on them externally, I find even less intimidating than the prospect of going up against an Eliezer_2002 who permanently stayed bounded at the highest suicidal level.

An AGI wannabe could theoretically have a different intellectual makeup that allows them to get farther and be more dangerous than Eliezer_2002, without passing the Schwarzchild bound and collapsing into an FAI programmer; but I see no evidence that this has ever actually happened.

To put it briefly: There really is an upper bound on how smart you can be, and still be that stupid.

So the state of the gameboard is not good, but the day is not already lost. You draw a line with all the sloppy suicides on one side, and those who slow down for precise standards on the other, and you hope that no one sufficiently intelligent + knowledgeable can stay on the wrong side of the line for long.

That’s the last thread on which our doom now hangs.”

 

 

Leave a Reply