If an AI wants to do something wrong or bad, the assumption is that we can forecast that action and somehow preclude it or, if our good intentions miss it, we can police it.

What’s missing is the element of agency and the potential for moral AI. At least one researcher is doing something about it.

Sharon Li, an assistant professor at the University of Wisconsin, is a leader in developing something called “out-of-distribution detection (“OOD”), which codes AI models to avoid taking action on things unless they’ve been trained on them.

In other words, don’t do stuff when you don’t really know what you’re doing.

This runs counter to the approach for ChatGPT and other LLMs, which are empowered to answer any query, even if doing so includes making stuff up. Such reach that exceeds their grasp is a feature, not a bug, with the assumption that the AI models will learn from their mistakes and eventually get smarter.

Li’s approach is more focused on designing models that know what they know and do what they’re supposed to do. By definition, this should limit the chances for mistakes or, worse, bad outcomes resulting from foolishly optimistic assumptions.

MIT Technology Review just named Li its 2023 Innovator of the Year for her work.

One could argue that OOD evidences what we might formerly have called a moral or guiding principle. 

Humility and an awareness of one’s limitations used to be considered a positive attribute. Other morals, like doing no harm, respecting others, and taking responsibility for your actions, were similarly assumed to be intrinsic to what it meant to be a citizen and good person.

Morals were qualities that didn’t require a lot of interpretation or theoretical debate. They were truisms of life.

When people decided they had the smarts or authority to modify those truisms, like deciding that so-and-so harm or risk was “acceptable” because of some aspirational or imaginary good, they were considered immoral.

Imagine if we could build such morals into AI instead of chasing and policing their lack of them?

Issac Asimov, one of my favorite sci-fi authors, imagined just that with his Three Laws of Roboticsback in 1942. He thought robots could be made so that they could do no harm to people (or allow it to happen), obey the orders of humans (as long as they didn’t violate the first law), and protect their own existence (as long as doing so didn’t violate the other two laws).

They weren’t laws, really, and certainly not coded commands, but rather things that robots simply couldn’t do, however the task, the interpretation of the data, or the intentions of its operators might suggest otherwise.

I’d argue that they were morals in every sense but name, just like the moral character of human beings. Asimov believed that they could be made a part of the very fiber of an AI’s being…not the result of reasoned argument but rather the instinctual outcome of an awareness of right and wrong.

The devil’s in the coding details, of course, but I think researchers like Sharon Li are onto something. We can’t hope to manage AI development with good intentions and policing.

If she and others can figure out how to give morals to AI, maybe it’ll help us rediscover them in ourselves.

[This essay appeared originally at Spiritual Telegraph]