458. Unnatural Categories
The boundaries of categories depend on your own utility function:
I’ve chosen the phrase “unnatural category” to describe a category whose boundary you draw in a way that sensitively depends on the exact values built into your utility function. The most unnatural categories are typically these values themselves! What is “true happiness”? This is entirely a moral question, because what it really means is “What is valuable happiness?” or “What is the most valuable kind of happiness?” Is having your pleasure center permanently stimulated by electrodes, “true happiness”? Your answer to that will tend to center on whether you think this kind of pleasure is a good thing. “Happiness”, then, is a highly unnatural category – there are things that locally bear a strong resemblance to “happiness”, but which are excluded because we judge them as being of low utility, and “happiness” is supposed to be of high utility.
This could be a problem for paperclippers:
I was recently trying to explain to someone why, even if all you wanted to do was fill the universe with paperclips, building a paperclip maximizer would still be a hard problem of FAI theory. Why? Because if you cared about paperclips for their own sake, then you wouldn’t want the AI to fill the universe with things that weren’t really paperclips – as you draw that boundary!
…Soon the AI grows up, kills off you and your species, and begins its quest to transform the universe into paperclips. But wait – now the AI is considering new potential boundary cases of “paperclip” that it didn’t see during its training phase. Boundary cases, in fact, that you never mentioned – let alone showed the AI – because it didn’t occur to you that they were possible. Suppose, for example, that the thought of tiny molecular paperclips had never occurred to you. If it had, you would have agonized for a while – like the way that people agonized over Terry Schiavo – and then finally decided that the tiny molecular paperclip-shapes were not “real” paperclips. But the thought never occurred to you, and you never showed the AI paperclip-shapes of different sizes and told the AI that only one size was correct, during its training phase. So the AI fills the universe with tiny molecular paperclips – but those aren’t real paperclips at all! Alas! There’s no simple experimental test that the AI can perform to find out what you would have decided was or was not a high-utility papercliplike object.
459. Magical Categories
FAI is hard. And proposals like this one by won’t work:
‘We can design intelligent machines so their primary, innate emotion is unconditional love for all humans. First we can build relatively simple machines that learn to recognize happiness and unhappiness in human facial expressions, human voices and human body language. Then we can hard-wire the result of this learning as the innate emotional values of more complex intelligent machines, positively reinforced when we are happy and negatively reinforced when we are unhappy.’
— Bill Hibbard (2001), Super-intelligent machines.
It should be pretty obvious that this proposal is very dangerous. Why? Here are some tips: Heroin, molecular smiley faces, brain-alteration, wireheading….
These then are three fallacies of teleology: Backward causality, anthropomorphism, and teleological capture.
First fallacy is obvious. Explanation for the second:
Teleological reasoning is anthropomorphic – it uses your own brain as a black box to predict external events. Specifically, teleology uses your brain’s planning mechanism as a black box to predict a chain of future events, by planning backward from a distant outcome.
Now we are talking about a highly generalized form of anthropomorphism – and indeed, it is precisely to introduce this generalization that I am talking about teleology! You know what it’s like to feel purposeful. But when someone says, “water runs downhill so that it will be at the bottom”, you don’t necessarily imagine little sentient rivulets alive with quiet determination. Nonetheless, when you ask, “How could the water get to the bottom of the hill?” and plot out a course down the hillside, you’re recruiting your own brain’s planning mechanisms to do it. That’s what the brain’s planner does, after all: it finds a path to a specified destination starting from the present.
The explanation for the third:
Similarly with those who hear of evolutionary psychology and conclude that the meaning of life is to increase reproductive fitness – hasn’t science demonstrated that this is the purpose of all biological organisms, after all?
Likewise with that fellow who concluded that the purpose of the universe is to increase entropy – the universe does so consistently, therefore it must want to do so – and that this must therefore be the meaning of life. Pretty sad purpose, I’d say! But of course the speaker did not seem to realize what it means to want to increase entropy as much as possible – what this goal really implies, that you should go around collapsing stars to black holes. Instead the one focused on a few selected activities that increase entropy, like thinking. You couldn’t ask for a clearer illustration of a fake utility function.
I call this a “teleological capture” – where someone comes to believe that the telos of X is Y, relative to some agent, or optimization process, or maybe just statistical tendency, from which it follows that any human or other agent who does X must have a purpose of Y in mind. The evolutionary reason for motherly love becomes its telos, and seems to “capture” the apparent motives of human mothers. The game-theoretical reason for cooperating on the Iterated Prisoner’s Dilemma becomes the telos of cooperation, and seems to “capture” the apparent motives of human altruists, who are thus revealed as being selfish after all. Charity increases status, which people are known to desire; therefore status is the telos of charity, and “captures” all claims to kinder motives. Etc. etc. through half of all amateur philosophical reasoning about the meaning of life.