Discover more from New Work in Philosophy
Will Fleisher (Georgetown University), "Understanding, Idealization, and Explainable AI"
Artificial Intelligence systems are suddenly everywhere: driving cars, writing poems, painting, and helping to determine who goes to prison. This won’t be news to any reader of this blog. The rapidly expanding use of AI raises a variety of thorny issues. We know these systems can be biased against marginalized groups, require big data collection that undermines our privacy, and can spread misinformation. And these systems are typically controlled by private corporations who face little democratic oversight.
One crucial issue with contemporary AI systems, one that exacerbates all their other problems, is opacity. Our most sophisticated AI systems, like those built using Deep Neural Networks (DNNs), are black boxes. We have tested them for accuracy and reliability, but we don’t understand precisely how the systems work, or why they are reliable at their tasks. Some AI systems are intentionally opaque simply because their developers won’t show us their code. Moreover, even if we can see the code, most AI systems are opaque to people without specialized education. But the deeper problem of opacity is that our most sophisticated AI systems are opaque even to their developers. That is, even those who program and train these systems do not fully understand them. The problem stems from their complexity, and from the fact that they are trained using ML rather than being explicitly programmed.
In my paper, “Understanding, Idealization, and Explainable AI”, I argue that this opacity amounts to a lack of understanding. The goal of opacity alleviation, then, is to understand why AI systems behave as they do. This focus on understanding offers more than a simple rephrasing of the problem. This because we have an independent grasp on what understanding is, and moreover there is a great deal of philosophical and psychological research about understanding. So, we can employ resources from epistemology, philosophy of science, and psychology to understand the opacity problem and how to solve it.
The first benefit of this understanding-based account is that it helps make sense of post hoc explainable AI (XAI) projects. Moreover, it helps undermine an in-principle objection to these projects that I call the rationalization objection. This objection claims that XAI methods merely provide rationalizations of AI systems rather than genuine explanations. This is because XAI models operate differently than the complex models they are designed to represent. In response, I argue that XAI methods produce idealized models, similar to those used to gain understanding in other areas of science. This helps to assuage the rationalization worry and motivate continued development of XAI methods.
Below, I offer a bit more detail about the problems addressed, and solutions offered, by the paper.
Deep neural networks illustrate the problem of opacity. DNNs are a kind of AI model used in our most sophisticated AI systems, including generative AI and Large Language Models such as GPT-4. DNN models consist in individual units (artificial neurons) arranged into layers, which are connected to units in other layers of the network. Each unit calculates a simple function. It receives input from its connections to other units. How much influence each input connection has on the unit is determined by a “weight” number. If the weighted sum of all its input connections is high enough, the unit activates and sends signals on its output connections. The parts of the model that get updated with training––the parameters of the model––are the weights of the connections. Although composed of simple units, a large, properly arranged DNN can calculate highly complex, non-linear functions. This allows DNNs to track complex patterns in the world that would be hard to explicitly program a system to track.
In a DNN, developers know all the underlying parts of the model, and how they are arranged. They know how the weights get updated by the ML training algorithms. What we typically don’t understand is precisely why a particular model, with particular weights, is reliable at its task. We don’t know what patterns in the world the system is tracking, and we don’t know how the model is tracking them. This is the problem of deep opacity.
The opacity of AI makes dealing with any of its other problems more difficult. For instance, it is hard to evaluate whether a system is operating under discriminatory rules if we don’t know the rules it is using. Moreover, it is hard to give people explanations for why they received certain judgment, or to provide them recourse if a mistake has been made about them. Discussions of how to alleviate opacity are hampered by disagreement about what alleviating opacity would actually mean. Opacity alleviation research is typically conducted in terms of “transparency”, “interpretability”, or “explainability”. However, the use of these terms is often inconsistent.
I argue that we should take the fundamental goal of opacity alleviation to concern understanding. We can then define transparency, interpretability, and explainability in terms of understanding. For instance, for a system to be interpretable (for a person) is for it to be understandable (by that person). The benefit of using understanding as the fundamental notion is that we have an independent grasp on what it is. In addition to our intuitive grasp of the concept, there is a great deal of research about understanding in epistemology, philosophy of science, and psychology.
On the account of understanding I prefer, it is an attitude that involves grasping an explanation that contains causal pattern information (Potochnik 2017). This causal information need not be a complete causal story. Moreover, the explanations in question can involve idealized models. Idealizations are aspects of a model that misrepresent the world in some way, i.e., that are false (Elgin 2017). Despite this, idealization is ubiquitous in science. This is because idealizations are necessary for explaining the extremely complex phenomena found in nature. Crucially, idealized scientific models can produce genuine understanding, as they contain true causal pattern information. This fact about idealization is important for avoiding the rationalization objection to Explainable AI methods.
XAI methods are applied after the use of complex AI systems like DNNs. When applied to a DNN, they deploy simpler models that are meant to approximate the operation of the DNN in order to help us see what it was doing when it made a particular classification or decision. For instance, an XAI system like LIME (Ribeiro 2016) is designed to approximate how a DNN made a decision about a particular individual. It builds a simple, linear model of the behavior of the DNN. This second, simpler model approximates the DNN’s behavior for individuals similar to the one in question. We can then use the LIME-generated model to see which features of the individual in question were most important in driving the DNN’s decision. LIME offers either a list of important features, or, when applied to image classification systems, a modified image that shows us which parts of the original image were driving the DNNs decision.
The rationalization objection suggests that post hoc XAI methods, such as LIME, do not provide genuine understanding (Babic et al. 2021). This is because XAI methods build models which (a) function differently than the original models, and (b) are imperfectly accurate at approximating those original models. So, taken at face value, they tell us false things about how the original models operate. Hence, the objectors claim, XAI methods like LIME cannot provide us genuine explanations.
The understanding account provides a response to this objection. On my account, we should treat successful XAI methods as producing idealized models. If XAI models are like idealized scientific models, then it is no surprise that they misrepresent their target AI models in various ways. Moreover, the opacity of AI systems results from their complexity, so it is no surprise that an attempt to understand them will involve idealizations. Idealizations are used in other areas of science for the same purpose: representing complex systems in a way that makes them understandable to humans.
The fact that LIME models misrepresent their targets in various ways is not a reason to doubt their ability to provide explanations. It simply means we need to use the same kind of caution in making model-to-world inferences that we do in other scientific contexts. Of course, XAI is still a developing field, and the existing methods like LIME are limited in their ability to explain AI systems. But the understanding account, in combination with recognizing the importance of idealization, blocks the in-principle rationalization objection. This suggests that further research on XAI methods can be potentially fruitful. Moreover, the understanding account gives us a specific goal of explainable AI by which we can evaluate these methods as they develop.
Babic B., Gerke S., Evgeniou T. and Cohen I.G. (2021). ‘Beware Explanations from AI in Health Care.’ Science 373(6552), 284–6.
Elgin C.Z. (2017). True Enough. Cambridge, MA: MIT Press.
Potochnik A. (2017). Idealization and the Aims of Science. Chicago, IL: University of Chicago Press.
Ribeiro M.T., Singh S. and Guestrin C. (2016). ‘‘Why Should I Trust You?’: Explaining The Predictions of
any Classifier.’ In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery
and Data Mining, pp. 1135–44.
Thanks for reading New Work in Philosophy! Subscribe for free to receive new posts and support my work.