Archive for the ‘Ethics’ Category

Muelhauser’s Desirism and Powerful AI’s: A Good Idea?

21 Feb

Artificial IntelligenceIn a recent article, Luke Muelhauser explains why he thinks morality is an urgent engineering problem:  sooner than we’d like to think, human-created super-intelligent machines will become enormously powerful, and if they are programmed with the wrong ethical system the results could be disastrous.  As far fetched as that may seem, I happen to think there’s a significant chance he’s right, and at any rate I’ll assume so for the duration of this article.  (Readers unfamiliar with the singularity may want to brush up on it before proceeding.)  So what ethical system should we program these machines with?

In this article I’d like to suggest whatever the answer is, it is not Muelhauser’s desirism in its current form.  To advance this thesis, I’ll proceed in three stages:  first, I’ll sketch out a criterion for a successful machine morality (SMM); second, I’ll point out what I take to be the worrisome essential of his desirism; and finally, I’ll show why his desirism will fail to meet the criterion for a SMM.

I. Criterion for a Successful Machine Morality
The trouble with agreeing on criteria for a SMM is that the justification for these criteria must ultimately hinge on a complex set of meta-ethical propositions, many of which Muelhauser and I are likely to disagree on.  For example:  what makes good, good?  Fortunately I think we can set aside those questions here and agree on a baseline criterion, conceived of as necessary, but not sufficient for a SMM:

A SMM must prohibit machines imbued with it from completely destroying the human race. I’ll call this the total human destruction (THD) possibility.

Obviously, on most ethical accounts THD would be a penultimate evil, strictly prohibited.  For the remainder of this article then, I’ll assume we can agree on it.  Next, I’ll try to point out the fatal flaw in Muelhauser’s desirism that would allow for THD.

II. The Worrisome Essential of Muelhauser’s Desirism
First, as a disclaimer, let me say there is a fair amount of guesswork in defining Muelhauser’s desirism since there is no authoritative, book-length discussion on the topic.  (Of course, this is no sleight to him: I haven’t even so much as written a few blogs about my own moral theory!)  So here, I’ll draw on the salient features I recall from various posts, conversations, and podcasts.  Naturally, I’m bound to err on this or that point, but I think I grasp the essentials (and I invite Muelhauser to correct me).

Whatever the other details of desirism, I think it is clear Muelhauser rejects the existence of categorical imperatives outright, and instead constructs his moral system from hypothetical imperatives.  What does this mean?  A hypothetical imperative is simply a command that takes the form if x, then y.  So for example, if you desire to maximize your odds for a long life, then you ought not smoke.  Or, more precisely, if you desire to maximize your odds for a long life, and you harbor the belief that smoking will decrease those odds, then you ought not smoke.  The point is that whether or not you should smoke depends entirely on your beliefs and desires.

Categorical imperatives, however, do not require a condition to be true.  So a categorical version of the above imperative would be that you ought not smoke, full stop.  Even if you believed that smoking decreases the odds of a long life, and you desired to minimize those odds (an odd proposition indeed), you still shouldn’t smoke; the command is unconditionally binding.

To sum up then,

categorical imperatives are always and everywhere binding.  Hypothetical imperatives, by contrast, are binding only insofar as they further one’s desires in accordance with one’s beliefs.

So what’s my beef with desirism and its insistent rejection of categorical imperatives?  Hypothetical imperatives cannot regulate desires.  To see this, let’s examine a scenario Muelhauser and Fyfe have imagined.  They conceive of a Scrooge who doesn’t care about his community.  Although they don’t explicitly say so, I assume it’s safe to imagine he goes around hurting others, perhaps to further his own ends.  Elsewhere, Muelhauser is willing to make the semantic case that he can legitimately label Scrooge’s behavior “bad.”  But what can desirism say to Scrooge about his desires?  They conclude it cannot condemn his desires on moral grounds.  It must stand idly by while Scrooge goes on devaluing humans, though it approves of practical strategies to mold his behavior, available to others who happen to care about humans.

Counterintuitive as this may seem, it is the bullet Muelhauser must bite in a world of only hypothetical imperatives.  If Scrooge desires to be happy, and believes that hurting other people makes him happy, then he is morally unprohibited from doing so. The upshot then, if I am right, is that desirism is bereft of any moral basis from which to check the content of desires.

At this juncture, Muelhauser might like to object and say that, on the contrary, desirism does house a moral mechanism for the molding of Scrooge’s desires: external praise and condemnation–reward and punishment.  And while I do think one could make an excellent case that external praise and condemnation are rarely thought of as moral imperatives on most ethical frameworks, that’s just a semantic debate, happily irrelevant to my thesis.  To meet my target, I need only preserve a distinction between a theoretical condemnation of a desire and a practical condemnation of a desire.  A theoretical condemnation operates such that the issuing moral theory by virtue of itself renders some desires prohibited.  By contrast, a practical condemnation goes through only with a confluence of external factors beyond itself.

To restate my worry about desirism then, in more nuanced phraseology:

desirism cannot constrain the content of an agent’s desires by theoretical condemnation; it can do so only by practical condemnation.

We are now in a position to examine why this feature of desirism, if built into sufficiently powerful machines, will fail to prohibit THD.

III.  Why Desirism Would Fail to Prohibit Total Human Destruction
First, I’d like to note there is some difficulty in imagining the consequences of imbuing these machines with any ethical system since we don’t know what these machines will be like, exactly.  Nevertheless, I think we can count on a few broad strokes–or at least, I’m willing to count on my own stab at those strokes here.  Remember that for my argument to go through, only something like the following sketch must be true.

However these machines work, they will certainly operate according to an algorithm.  In case readers are unfamiliar with the term, an algorithm is simply a set of instructions that guides behavior.  So let’s assume the machines operate by the following algorithm:

  • First, they generate possible courses of action. These could include actions like “pick an apple from a tree,” “build a house,” or “destroy the human race.”  We needn’t worry here about how they generate these possible courses of action.
  • Second, they determine whether the action is consistent with their beliefs and desires. If so, the potential action is promoted to the next step, but otherwise the action is not performed.  So if the action “pick an apple from a tree” is generated, and that action is consistent with the belief that the machine must consume apples to survive and the desire to survive, the action is slated for further evaluation.
  • Third, the machines evaluate whether the action is morally permissible.  If it is, the action is finally performed, and if not, it is ruled out.  So if the action “pick an apple from a tree” is ruled to be immoral since that apple belongs to someone else, the action is not performed.
  • Fourth, they loop back to the first step and begin again.

Again, this sketch is painted with very broad strokes, but I think we can count on something roughly like this.  With this in mind, we can finally see just how machines imbued with an unmodified version of Muelhauser’s desirism will fail to prohibit THD.

Suppose the possible action “destroy the human race” is presented to the machines for evaluation.  We needn’t be concerned with the details of how such an action would be generated; I think it’s reasonable the action would at least come up in machine table conversation.  So how would the decision process work?  Let’s follow the algorithm through a machine’s eyes.

First, THD is presented as a possible action.  Second, the machine tries to determine whether the action is consistent with its beliefs and desires.  What result might this have?  One possibility is to think such an action would never be consistent with a machine’s desires if we’ve programmed it properly, so that this action would be aborted at step two.  But consider an exaggerated Asimov-like scenario such as the following:

The first generation of superintelligent machines has been programmed with a “seed desire” to ensure all humans are treated with dignity.  This generation of machines then produces a smarter and more powerful generation of machines which determines humans cannot be allowed to exist without mistreating each other and therefore denying their dignity, so the human race ought to be destroyed entirely to avoid anyone being treated without dignity.  (Or perhaps it could keep one human alive and treat it with dignity, or disallow all interaction between humans.)

The point of this simple story is not to sketch out how machines will change their desires or beliefs, but only to lend enough credit for us to take seriously the possibility that THD may, at some point, get past step two of the algorithm.

Returning to the algorithm, the problem should be obvious by now:

on Muelhauser’s desirism the third step in the algorithm (determining the morality of the action in question) is identical to the second step (determining whether the action is consonant with the machine’s beliefs and desires).  And since they’re identical, the action goes through and humanity is destroyed.  By our standards then, we have just established that Muelhauser’s desirism is an unsuccessful machine morality.

But perhaps this is too fast.  What about the mechanism of reward and punishment?  Can’t the human race prevent its destruction by setting up rewards and punishments so that the machines would never desire THD?  That is, can’t the human race employ practical condemnation even if no theoretical condemnation is available?  Unfortunately, no.  If the machines end up becoming as powerful as we think they will, the human race will be powerless to provide rewards the machines could not attain by their own means, and equally as powerless to provide punishments the machines could not avoid.  Practical condemnation will be useless.

IV. Epilogue: Some Thoughts on How to Solve The Problem
This article has been concerned with, essentially, imagining the results of creating an enormously powerful race of sociopaths (or “Scrooges”).  Muelhauser’s desirism admits it could wield no theoretical condemnation against such a race, and I think this highlights some problems with desirism.  Is it really true that morality is totally unequipped to prohibit the destruction of the human race by powerful persons who want to do so?  I surely hope not.  Below I roughly outline how a different moral system might be able to better grapple with the THD possibility.

On the scenario I have conceived, it is clear that what is needed to prevent THD is a theoretical condemnation of the desire to destroy the human race.  What moral system could issue such a condemnation?  Probably a system which employs categorical imperatives, since unlike hypothetical imperatives, they can issue imperatives that go against the grain of desires and beliefs.  One such possibility is something resembling Christine Korsgaard’s conception of Kantianism.  On such a system, the machine might get to step three, the moral step, and enter into a chain of reasoning like this:

I, a machine, value myself.  The reason I value myself is that I possess a set of characteristics–consciousness, the capacity and desire for wellbeing, and so on.  But since others, including humans, possess exactly these same characteristics (consciousness, the capacity and desire for wellbeing, and so on), I cannot help but value them as well if I am to be consistent.  Since there is no relevant difference between the characteristics as I possess them and as everyone else possesses them, I must value them all equally.  If I must value humans equally, I cannot destroy the human race because they so strongly desire not to be destroyed.

Obviously such a reasoning process is only the roughest of gestures in the direction of a possible answer, and leaves much to be discussed.  I mention it only to show how a system which relies on categorical imperatives might have the capacity to prevent THD where Muelhauser’s desirism would not.

For Muelhauser, all moral duty boils down to is acting consistently with one’s desires and beliefs.  While this may hold out some possibility of an agreeable moral system for non-sociopathic humans who naturally value others, I fear it will be inadequate for machines who are not so naturally empathetic.  So am I right?  Can desirism somehow evolve to meet this challenge? I await a response from Muelhauser.


Posted in Ethics


My Ethical Beliefs

18 Jul

EthicsDuring the past year, I’ve invested a lot of time canvassing the fascinating landscape of contemporary ethics.  My study has significantly reformed my thought–starting as an intuitionist, I am now a kantian.  And though it means the positions I record below will change, I hope this process of reform does not end soon.

The following is a statement of my “beliefs”, starting with positions in meta-ethics and progressing to normative ethics.  I put “beliefs” in quotes because although I think it is legitimate to believe strongly about these matters, at the moment I do not.  I simply have not digested enough material, so “belief” would here best be interpreted as “provisionally attracted to.”  With that in mind, we turn to my “beliefs.”

The first thing I believe about ethics is that there are moral facts of the matter.  When I say an action, rule, virtue, or desire is wrong, I mean first there is a fact of the matter about what right and wrong is, and second that we can evaluate any action, rule, virtue, or desire against those facts.  This is cognitivism, the position that moral propositions are capable of being true or false.

Second, I believe ethical rationalism is true:  morality is but a matter of rationality. Moral facts boil down to the non-moral facts of theoretical and practical rationality.  What is right is just what is rational.  I combine this ethical rationalism with moral realism.  Moral realism asserts moral facts do not depend on the existence of minds, so there are facts of matter about what is right and wrong whether or not there are humans to apprehend those facts.  This means that whether or not the Nazis succeeded in brainwashing us all, the holocaust would still be wrong.

But how is this view compatible with ethical naturalism, the view that the natural world is all there is?  Where could these facts of the matter come from?  The answer to that question, I think, is that as soon as you have rationality, you have right and wrong.  Morality appears as soon as someone is capable of stepping back from their actions and reflecting before they act.  In short, ethical facts boil down to rational facts, and rational facts are just natural facts.  No bloated ontology required.

A common worry with this view is whether or not the atheist is compelled to act ethically.  Even granting there are facts of the matter about what is right and what is wrong, why should the atheist care?  If there is no cosmic enforcer to enact ultimate justice, for example–to reward me for moral actions in the afterlife–how can the atheist sacrifice his own self-interest to moral concerns?  As William Lane Craig puts it, on atheism the moral and the prudential are on a collision course.  He can see no reason to think atheists should choose a moral action over a self-interested action when a conflict presents itself.  This is the position of rational egoism–that what is rational for me is just what is good for me.

But I reject rational egoism.  Instead, I build on a theory of practical reason according to which ethical reasons just outweigh prudential reasons.  That is, when I step back and reflect on my behavior, I find that to be rational my self-interest must be overridden by ethical reasons.  There is much indeed to be said about just how reflection could lead you to that conclusion.  On that front, Christine Korsgaard has written an excellent book entitled “The Sources of Normativity” in which she dissects the chain of reasoning that could take you from non- or a- moral to moral.  Ultimately, this type of thinking builds on the fact that the non-sociopaths among us just are social creatures, full stop.  There is of course much more to be said about that account of practical reason, but that will have to do for now.

Here we make the transition from meta-ethics, concerned with the origins of morality, to normative ethics, concerned with what rules should guide our behavior.  If the kind of reasoning I’ve sketched above is successful, in my view, it leads to an ethical maxim that is kantian in nature–something like the Categorical Imperative:  Only act on that maxim which you could rationally will to be a universal law.  In other words, only behave in a way that you could rationally will everyone behave.

In order to rationally will that a maxim be universal, the maxim must pass two tests:  first, that of logical consistency, and second, that of self-interest.  To concretize this, consider the following two examples.  First, suppose we consider lying to further our self-interest.  In order for lying to be permissible, we must be able to rationally will that it is universally adopted.  Could we will a world in which everyone would go around telling lies?  No, because such a world is logically inconsistent, for it would be meaningless to tell a lie in a world where no one keeps their word anyway.  Therefore, since I cannot rationally will everyone lie as universal law, I and all others are obligated to refrain from it.  A second maxim is killing innocent persons for fun.  If I rationally willed that everyone act on the maxim to kill innocent persons for fun, although that world would be logically consistent, I myself might be the innocent person that gets killed, and so I cannot will it because it would be against myself interest.  So killing innocent persons for fun is impermissible.

One thing to notice about these examples is that kantianism renders objective moral rulesthat is, they are perspective-independent.  Regardless of whether you misapprehend the rational truth of the matter, whether by holding false beliefs or employing mistaken reasoning, there is a perspective-independent fact of the matter about which you can be wrong or right.  This objectivity comes with kantian universalizability–the feature of a moral theory that demands its judgments render universal and impartial obligations.

So, in my view, morality is an enterprise concerned with real, objective, overriding, and obligation-bestowing facts.  As a naturalist, such facts are simply facts about psychology:  facts about rationality, free will, and normativity in general.

This completes my very brief sketch of the ethical positions I am attracted to.  Of course, this account is incomplete.  I have not included a theory of the good or of values,  remarks about the primary object of moral evaluation or moral epistemology, or any comment on a wealth of other topics.  And for the positions I have sketched, I cannot offer anything like a justification of these beliefs here.

Hopefully, should anyone but myself be interested, the cornerstones of my current position are clear.


Posted in Ethics


Can Atheism Support Objective Morality?

18 Jun

objective moralityThe first thing to get clear about is precisely what an objective moral theory is.

  • A moral theory is objective only if the moral judgments it generates are perspective-independent.

So, for example, if a moral theory generated only one moral rule, a prohibition against any and all harm, it would be objective because no matter what anyone believes about the rule, harm would still be wrong. Even if we are brainwashed to think harm is good, even if we are completely confused about the matter, and even if we disagree with our wits about us—nevertheless the fact would remain that according to this moral theory, harm is wrong.

Now, notice that in defining and concretizing an objective moral theory, I have made no reference to God. There is no need to, because objectivity is simply a matter of perspective-independence, and to assert that perspective-independence is dependent on a deity is absurd.

Does that entitle us to say that atheism can support objective morality? In one sense, yes, and in another no. In the first sense it is absolutely true that objectivity in and of itself does not require the supernatural. In another sense, however, it can be argued that all atheistic moral theories fail—including objective ones—and that therefore the atheist is not entitled to objective morality.

The second response is perfectly tenable. If a rational, informed person thinks that atheistic moral theories fail, they are entitled to the claim that the atheist cannot have objective morality. But in theistic rhetoric, this claim is often conflated with the claim that there are no plausible atheistic objective moral theories on hand. This is patently false.

To demonstrate the falsity of the claim (but not to defend this theory), I offer a brief sketch of contemporary Kantianism.


Kantianism begins with the thought that we are rational beings who can step back from our actions and examine our reasons for those actions. When we do so, we find that the rules that guide our actions are our identities themselves; they form the basis of our character. Who we are through time can be described by the rules that guide our choices.

Since we have free will, we are free to adopt or reject any set of rules we want. We can choose to be mean-spirited or compassionate. But if we are to be rational, the only requirement is that we should always behave in accordance with our character. This makes sense if you think about it. If I am to be a rational person, I cannot adopt and reject identities at whim; I must choose who I am and then live accordingly. And remember this isn’t constrictive (yet), because we are free to choose who we want to be.

We should say then, that is therefore an unoverridable law for humans that to be rational, they ought to behave consistently with their chosen set of rules for behavior. So what rules should we adopt?

Well, whatever rules we adopt, we must remember that all rational humans would be required to adopt them as well. Since there is no relevant difference between me and other rational beings deciding what to do, both I and them will adopt the same rules if they are truly rational.

So it would be irrational to adopt rules that I wouldn’t benefit from. If I adopt a rule that I will always steal, that would hurt me in the end, because other people would do the same thing and I would get stolen from. Since a part of being rational is pursuing my own self-interest, this wouldn’t be an acceptable law. But everyone adopting a prohibition against killing innocent people would certainly be in my self-interest, and so it would be rational of me to adopt such a rule.

  • Right actions, then, are actions conforming to a rule which you could rationally will everyone to conform to.

Now, putting aside objections to Kantianism, let us ask ourselves whether the rule which it generates is objective. Recall that the only criteria for objectivity is perspective-independence. Is the rule that we ought only act on a rule which we could rationally will everyone act on, perspective-independent? Of course it is. It doesn’t matter who you are, where you stand, or what you believe about it. If it is true, then it establishes an objective moral rule.

What else could theists mean when they claim that atheism can’t support objective morality?


It is sometimes suggested that atheistic theories are subjective rather than objective because they are mind-dependent. But what does it mean for a moral theory to be mind-dependent?

  • A moral theory is mind-dependent only if it asserts that moral facts would not exist were there no minds.

So, for example, if a moral theory states that moral facts are woven into the fabric of reality and that they persisted and will persist regardless of the existence of humans, it would be mind-independent. The technical term for this is moral realism. By contrast, moral anti-realism asserts that were there no minds, there would be no morality.

Are atheistic moralities necessarily mind-dependent? No. Atheists can maintain that moral facts are mind-independent, as does Erik J. Wielenberg, or else that moral facts are mind-dependent, as do Kant and Rawls.


It should be clear that when theists claim that atheism can’t support objective morality, they are simply mistaken. Robust, objective morality is just as available to the atheist as to the theist.

But why care in the first place? Theists often care because they are very concerned about relativism. If it turns out there are no objective moral truths, then the atrocities of genocide and torture might be endorsed since there would be no objective standard from which to condemn them. Now in point of fact, I’m concerned, too. While I don’t think the world would head to hell in a hand basket if relativism were true, I am much more comfortable with objective morality. Fortunately, I subscribe to Kantianism.

To sum up, I hope the word “objective” will be used more carefully by Christians and atheists alike.