Leonard Dung (Ruhr-University Bochum), "Is Superintelligence Necessarily Moral?"
Analysis, 2024
By Leonard Dung
It seems that intelligence and competence for moral thinking are connected. All other things being equal, someone who is smarter should be expected to be better at reasoning and learning and to have acquired more knowledge. This should include moral reasoning, moral learning, and knowledge about morality. Arguably, past experience confirms this thought. For example, as the computer scientist Scott Aaronson has pointed out, very few great scientists and other intellectuals in the 20th century became supporters of Nazi Germany and the corresponding NS ideology. While there were some exceptions (Werner Heisenberg is perhaps the most striking one), most outstanding researchers quickly left Germany and Austria and spoke out against Hitler – a contrast with large parts of the German population. Hence, it seems that the intellectual capacity of these researchers also helped them to have relevant political and ultimately moral insights about the morally bankrupt character of NS ideology.
Concerns about hypothetical future superintelligent AI agents often assume that humanity may create AI systems which are much more intellectually capable than humans and pose massive risks to humanity, perhaps even of intentionally bringing about human extinction. However, if intelligence and moral sensibility are connected, then this may seem implausible. If killing off all humans is immoral, shouldn’t we then expect that a superintelligent AI is, in virtue of its intelligence, going to recognize this – just like Einstein and his fellow physicists recognized the immorality of the NS ideology? This leaves open that a superintelligent AI may make humanity extinct if that is the morally best thing to do – but in this case we humans arguably should not resist this outcome.
In its most common form, the opposing viewpoint adopts the so-called ‘orthogonality thesis’. This is the thesis, developed by Nick Bostrom and Steve Omohundro, that intelligence and final goals are orthogonal, that is, independent. On this view, intelligence is the capacity to achieve arbitrary goals – without placing any constraints on what these goals are. Final goals are the things someone wants for themselves, not for something else’s sake (money, for instance, is an instrumental goal: you want it for the sake of the things you can buy with it).
So, an opponent of the orthogonality thesis would need to hold that moral reasoning and knowledge constrain which final goals a system has. Moreover, Müller and Cannon argue that even humans have the ability to evaluate and reconsider their goals in light of whether they are reasonable and morally defensible. If so, one may hold, then a superintelligence should definitely have this ability and thus converge on reasonable final goals.
In my article, I defend the orthogonality thesis against this kind of objection. Roughly, I argue that the orthogonality thesis does not assume that a superintelligence lacks knowledge of what final goal is morally adequate or that the superintelligence lacks the ability to reconsider its final goals in the light of moral views. Instead, the thesis is motivated by the idea that, in most situations, the superintelligent system lacks an instrumental reason to revise its final goals and make them accord to morality. This is the case because, if the system’s only final goal is G and G does not involve moral adequacy, then – it most situations – the system has no instrumental reason to change its final goals in accordance with what it takes to be morally adequate.
In most situations, revising one’s goal to not be G is bad from the standpoint of G. For example, if my goal is to have the football team I am a part of win, then it is bad for the satisfaction of this goal if my goal changes – because this may make me join the opposite them and thus makes it less likely for my original team to win. Based on this reasoning, one would predict that a superintelligent system would usually stick with final goals which are bad for humanity, once it has them, regardless of its capacity for moral judgement.
In my article, I also discuss and reject an objection to the orthogonality thesis based on moral realism, understood as the view that there are objective moral facts, and moral internalism, understood as the view that moral judgements have necessary motivational force. Finally, I give some tentative positive reasons to think that the orthogonality thesis would hold of a superintelligence which is created with methods close to modern-day machine learning, especially reinforcement learning.
All this of course does not imply that an artificial superintelligence will or can be created. Moreover, it leaves open how hard it is to intentionally design very advanced AI systems such that they have morally appropriate, or at least harmless, final goals, and how dangerous harmful moral goals would be to humanity, e.g. whether they could motivate a superintelligence to bring about human extinction. Finally, most of my argument is negative, directed against objections to the orthogonality thesis, rather than positive, directly in favor of the thesis. However, if the orthogonality thesis is indeed correct, then we cannot assume that a superintelligence automatically, just by virtue of being very intelligent, has morally adequate goals. Instead, ensuring that a superintelligence has desirable final goals may be a hard engineering challenge with a looming risk of failure.