Archive for the ‘Understanding AI’ Category
On being a human in the loop
A recent article in The Register notes that Microsoft has tweaked its fine print to warn users not to take its AI seriously. The relevant update to the license terms reads, “AI services are not designed, intended, or to be used as substitutes for professional advice.”
Aside from the fact that users ought not to believe everything a software vendor claims, Microsoft’s disingenuous marketing is also to blame. For example, their website currently claims that, “Copilot…is unique in its ability to understand … your business data, and your local context.” The truth is that the large language models (LLMs) that underpin Copilot understand nothing.
Microsoft has sold Copilot to organisations by focusing on the benefits of generative AI while massively downplaying its limitations. Although they recite the obligatory lines about having a human in the loop, they neither emphasise nor explain what this entails.
–x–
As Subbarao Kambhampati notes in this article LLMs should be treated as approximate retrieval engines. This is worth keeping in mind when examining vendor claims.
For example, one of the common use cases that Microsoft touts in its sales spiel is that generative AI can summarise documents.
Sure, it can.
However, given that LLMs are approximate retrieval engines how much would you be willing trust an AI-generated summary? The output of an LLM might be acceptable for a summary that will go no further than your desktop or may be within your team, but would you be comfortable with it going to your chief executive without thorough verification?
My guess is that you would want to be a (critically thinking!) human in that loop.
–x–
In a 1983 paper entitled, The Ironies of Automation, Lisanne Bainbridge noted that:
“The classic aim of automation is to replace human manual control, planning and problem solving by automatic devices and computers. However….even highly automated systems, such as electric power networks, need human beings for supervision, adjustment, maintenance, expansion and improvement. Therefore one can draw the paradoxical conclusion that automated systems still are man-machine systems, for which both technical and human factors are important. This paper suggests that the increased interest in human factors among engineers reflects the irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator.”
Bainbridge’s paper presaged many automation-related failures that could have been avoided with human oversight. Examples range from air disasters to debt recovery and software updates.
–x–
Bainbridge’s prescient warning about the importance of the human in the loop has become even more important in today’s age of AI. Indeed, inspired by Bainbridge’s work, Dr. Mica Endsley recently published a paper on the ironies of AI. In the paper, she lists the following five ironies:
AI is still not that intelligent: despite hype to the contrary, it is still far from clear that current LLMs can reason. See, for example, the papers I have discussed in this piece.
The more intelligent and adaptive the AI, the less able people are to understand the system: the point here being that the more intelligent/adaptive the system, the more complex it is – and therefore harder to understand.
The more capable the AI, the poorer people’s self-adaptive behaviours for compensating for shortcomings: As AIs become better at what they do, humans will tend to offload more and more of their thinking to machines. As a result, when things go wrong, humans will find themselves less and less able to take charge and fix issues.
The more intelligent the AI, the more obscure it is, and the less able people are to determine its limitations and biases and when to use the AI: as AIs become more capable, their shortcomings will become less obvious. There are at least a couple of reasons for this: a) AIs will become better at hiding (or glossing over) their limitations and biases, and b) the complexity of AIs will make it harder for users to understand their workings.
The more natural the AI communications, the less able people are to understand the trustworthiness of the AI: good communicators are often able to trick people into believing or trusting them. It would be exactly the same for a sweet-talking AI.
In summary: the more capable the AI, the harder it is to be a competent human in the loop, but the more critical it is to be one.
–x–
Over the last three years, I’ve been teaching a foundational business analytics course at a local university. Recently, I’ve been getting an increasing number of questions from students anxious about the implications of generative AI for their future careers. My responses are invariably about the importance of learning how to be a thinking human in the loop.
This brings up the broader issue of how educational practices need to change in response to the increasing ubiquity of generative AI tools. Key questions include:
- How should AI be integrated into tertiary education?
- How can educators create educationally meaningful classroom activities which involve the use AI?
- How should assessments be modified to encourage the use of AI in ways that enhance learning?
These are early days yet, but some progress has been made on addressing each of the above. For examples see:
- The following paper, based on experiences of introducing AI at a Swiss university describes how to integrate AI into university curricula: https://link.springer.com/article/10.1186/s41239-024-00448-3
- This paper by Ethan and Lilach Mollick provides some interesting creative examples of AI use in the class: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4391243
- This document, by the Australian Tertiary Education Quality and Standards Agency, provides guidance on how AI can be integrated into university assessments: https://www.teqsa.gov.au/sites/default/files/2023-09/assessment-reform-age-artificial-intelligence-discussion-paper.pdf
However, even expert claims should not be taken at face value. An example might help illustrate why:
In his bestselling book on AI, Ethan Mollick gives an example of two architects learning their craft after graduation. One begins his journey by creating designs using traditional methods supplemented by study of good designs and feedback from an experienced designer. The other uses an AI-driven assistant that highlights errors and inefficiencies in his designs and suggests improvements. Mollick contends that the second architect’s learning would be more effective and rapid than that of the first.
I’m not so sure.
A large part of human learning is about actively reflecting on one’s own ideas and reasoning. The key word here is “actively” – meaning that thinking is done by the learner. An AI assistant that points out flaws and inefficiencies may save the student time, but it also detracts from learning because the student is also “saved” from the need to reflect on their own thinking.
–x–
I think it is appropriate to end this piece by quoting from a recent critique of AI by the science fiction writer, Ted Chiang:
“The point of writing essays is to strengthen students’ critical-thinking skills; in the same way that lifting weights is useful no matter what sport an athlete plays, writing essays develops skills necessary for whatever job a college student will eventually get. Using ChatGPT to complete assignments is like bringing a forklift into the weight room; you will never improve your cognitive fitness that way.”
So, what role does AI play in your life: assistant or forklift?
Are you sure??
–x–x–
Acknowledgement: This post was inspired by Sandeep Mehta’s excellent article on the human factors challenge posed by AI systems.
Can Large Language Models reason?
There is much debate about whether Large Language Models (LLMs) have reasoning capabilities: on the one hand, vendors and some researchers claim LLMs can reason; on the other, there are others who contest these claims. I have discussed several examples of the latter in an earlier article, so I won’t rehash them here. However, the matter is far from settled: the debate will go on because new generations of LLMs will continue to get better at (apparent?) reasoning.
It seems to me that a better way to shed light on this issue would be to ask a broader question: what purpose does language serve?
More to the point: do humans use language to think or do they use it to communicate their thinking?
Recent research suggests that language is primarily a tool for communication, not thought (the paper is paywalled, but a summary is available here). Here’s what one of the authors says about this issue:
““Pretty much everything we’ve tested so far, we don’t see any evidence of the engagement of the language mechanisms [in thinking]…Your language system is basically silent when you do all sorts of thinking.”
This is consistent with studies on people who have lost the ability to process words and frame sentences, due to injury. Many of them are still able to do complex reasoning tasks such as play chess or solve puzzles, even though they cannot describe their reasoning in words. Conversely, the researchers find that intellectual disabilities do not necessarily impair the ability to communicate in words.
–x–
The notion that language is required for communicating but not for thinking is far from new. In an essay published in 2017, Cormac McCarthy noted that:
“Problems in general are often well posed in terms of language and language remains a handy tool for explaining them. But the actual process of thinking—in any discipline—is largely an unconscious affair. Language can be used to sum up some point at which one has arrived—a sort of milepost—so as to gain a fresh starting point. But if you believe that you actually use language in the solving of problems I wish that you would write to me and tell me how you go about it.”
So how and why did language arise? In his epic book, The Symbolic Species, the evolutionary biologist and anthropologist Terrence Deacon suggests that it arose out of the necessity to establish and communicate social norms around behaviours, rights and responsibilities as humans began to band into groups about two million years ago. The problem of coordinating work and ensuring that individuals do not behave in disruptive ways in large groups requires a means of communicating with each other about specific instances of these norms (e.g., establishing a relationship, claiming ownership) and, more importantly, resolving disputes around perceived violations. Deacon’s contention is that language naturally arose out of the need to do this.
Starting with C. S. Pierce’s triad of icons, indexes and symbols, Deacon delves into how humans could have developed the ability to communicate symbolically. Symbolic communication is based on the powerful idea that a symbol can stand for something else – e.g., the word “cat” is not a cat, but stands for the (class of) cats. Deacon’s explanation of how humans developed this capability is – in my opinion – quite convincing, but is by no means widely accepted. As echoed by McCarthy in his essay, the mystery remains:
“At some point the mind must grammaticize facts and convert them to narratives. The facts of the world do not for the most part come in narrative form. We have to do that.
So what are we saying here? That some unknown thinker sat up one night in his cave and said: Wow. One thing can be another thing. Yes. Of course that’s what we are saying. Except that he didn’t say it because there was no language for him to say it in [yet]….The simple understanding that one thing can be another thing is at the root of all things of our doing. From using colored pebbles for the trading of goats to art and language and on to using symbolic marks to represent pieces of the world too small to see.”
So, how language originated is still an open question. However, once it takes root, it is easily adapted to purposes other than the social imperatives it was invented for. It is a short evolutionary step from rudimentary communication about social norms in hunter-gatherer groups to Shakespeare and Darwin.
Regardless of its origins, however, it seems clear that language is a vehicle for communicating our thinking, but not for thinking or reasoning.
–x–
So, back to LLMs then:
Based, as they are, on a representative corpus of human language, LLMs mimic how humans communicate their thinking, not how humans think. Yes, they can do useful things, even amazing things, but my guess is that these will turn out to have explanations other than intelligence and / or reasoning. For example, in this paper, Ben Prystawksi and his colleagues conclude that “we can expect Chain of Thought reasoning to help when a model is tasked with making inferences that span different topics or concepts that do not co-occur often in its training data, but can be connected through topics or concepts that do.” This is very different from human reasoning which is a) embodied, and thus uses data that is tightly coupled – i.e., relevant to the problem at hand and b) uses the power of abstraction (e.g. theoretical models).
In time, research aimed at understanding the differences between LLM and human reasoning will help clarify how LLMs do what they do. I suspect it will turn out that that LLMs do sophisticated pattern matching and linking at a scale that humans simply cannot and, therefore, give the impression of being able to think or reason.
Of course, it is possible I’ll turn out to be wrong, but while the jury is out, we should avoid conflating communication about thinking with thinking itself.
–x–x–
Postscript:
I asked Bing Image Creator to generate an image for this post. The first prompt I gave it was:
An LLM thinking
It responded with the following image:
I was flummoxed at first, but then I realised it had interpreted LLM as “Master of Laws” degree. Obviously, I hadn’t communicated my thinking clearly, which is kind of ironic given the topic of this article. Anyway, I tried again, with the following prompt:
A Large Language Model thinking
To which it responded with a much more appropriate image:
Selling AI ethically – a customer perspective
Artificial intelligence (AI) applications that can communicate in human language seem to capture our attention whilst simultaneously blunting our critical capabilities. Examples of this abound, ranging from claims of AI sentience to apps that are “always here to listen and talk.” Indeed, a key reason for the huge reach of Large Language Models (LLMs) is that humans can interact with them effortlessly. Quite apart from the contested claims that they can reason, the linguistic capabilities of these tools are truly amazing.
Vendors have been quick to exploit our avidity for AI. Through relentless marketing, backed up by over-the-top hype, they have been able to make inroads into organisations. Their sales pitches tend to focus almost entirely on the benefits of these technologies, with little or no consideration of the downsides. To put it bluntly, this is unethical. Doubly so because customers are so dazzled by the capabilities of the technology that they rarely ask questions that they should.
AI ethics frameworks (such as this one) overlook this point almost entirely. Most of them focus on things such as fairness, privacy, reliability, transparency etc. There is no guidance or advice to vendors on selling AI ethically, by which I mean a) avoiding overblown claims, b) being clear about limitations of their products and c) showing customers how they can engage with AI tools meaningfully – i.e., in ways that augment human capabilities rather than replacing them.
In this article, I offer some suggestions on how vendors can help their customers develop a balanced perspective on what AI can do for them. To set the scene, I will begin by recounting the public demo of an AI product in the 1950s which was accompanied by much media noise and public expectations.
Some things, it seems, do not change.
–x–
The modern history of Natural Language Processing (NLP) – the subfield of computer science that deals with enabling computers to “understand” and communicate in human language – can be traced back to the Georgetown-IBM research experiment that was publicly demonstrated in 1954. The demonstration is trivial by today’s standards. However, as noted by John Hutchins’’ in this paper, “…Although a small-scale experiment of just 250 words and six ‘grammar’ rules it raised expectations of automatic systems capable of high quality translation in the near future…” Here’s how Hutchins describes the hype that followed the public demo:
“On the 8th January 1954, the front page of the New York Times carried a report of a demonstration the previous day at the headquarters of International Business Machines (IBM) in New York under the headline “Russian is turned into English by a fast electronic translator”: A public demonstration of what is believed to be the first successful use of a machine to translate meaningful texts from one language to another took place here yesterday afternoon. This may be the cumulation of centuries of search by scholars for “a mechanical translator.” Similar reports appeared the same day in many other American newspapers (New York Herald Tribune, Christian Science Monitor, Washington Herald Tribune, Los Angeles Times) and in the following months in popular magazines (Newsweek, Time, Science, Science News Letter, Discovery, Chemical Week, Chemical Engineering News, Electrical Engineering, Mechanical World, Computers and Automation, etc.) It was probably the most widespread and influential publicity that MT (Machine Translation – or NLP by another name) has ever received.”
It has taken about 60 years, but here we are: present day LLMs go well beyond the grail of machine translation. Among other “corporately-useful” things, LLM-based AI products such as Microsoft Copilot can draft documents, create presentations, and even analyse data. As these technologies requires no training whatsoever, it is unsurprising that they have captured corporate imagination like never before.
Organisations are avid for AI and vendors are keen to cash in.
Unfortunately, there is a huge information asymmetry around AI that favours vendors: organisations are typically not fully aware of the potential downsides of the technology and vendors tend to exploit this lack of knowledge. In a previous article, I discussed how non-specialists can develop a more balanced perspective by turning to the research literature. However, this requires some effort and unfairly puts the onus entirely on the buyer.
Surely, vendors have a responsibility too.
–x–
I recently sat through a vendor demo of an LLM-based “enterprise” product. As the presentation unfolded, I made some notes on what the vendor could have said or done to help my colleagues and I make a more informed decision on the technology. I summarise them below in the hope that a vendor or two may consider incorporating them in their sales spiel. OK, here we go:
Draw attention to how LLMs do what they do: it is important that users understand how LLMs do what they do. Vendors should demystify LLM capabilities by giving users an overview of how they do their magic. If users understand how these technologies work, they are less likely to treat their outputs as error-free or oracular truths. Indeed, a recent paper claims that LLM hallucinations (aka erroneous outputs) are inevitable – see this article for a simple overview of the paper.
Demo examples of LLM failures: The research literature has several examples of the failure of LLMs in reasoning tasks – see this article for a summary of some. Demonstrating these failures is important, particularly in view of Open AI’s claim that its new GPT-4o tool can reason. Another point worth highlighting is the bias present in LLM (and more generally Generative AI) models. For an example, see the image created by the Bing Image Creator – the prompt I used was “large language model capturing a user’s attention.”
Discourage users from outsourcing their thinking: Human nature being what it is, many users will be tempted to use these technologies to do their thinking for them. Vendors need to highlight the dangers of doing so. If users do not think a task through before handing it to an LLM, they will not be able to evaluate its output. Thinking task through includes mapping out the steps and content (where relevant), and having an idea of what a reasonable output should look like.
Avoid anthropomorphising LLMs: Marketing will often attribute agency to LLMs by saying things such as “the AI is thinking” or “it thinks you are asking for…”. Such language suggests that LLMs can think or reason as humans do, and biases users towards attributing agency to these tools.
Highlight potential dangers of use in enterprise settings: Vendors spend a lot of time assuring corporate customers that their organisational data will be held securely. However, exposing organisational data (such as data on corporate OneDrive folders) even within the confines of the corporate network can open the possibility of employees being able to query information that they should not have access. Moreover, formulating such queries is super simple because they can be asked in plain English. Vendors claim that this is not an issue if file permissions are implemented properly in the organisation. However, in my experience, people always tend to overshare files within their organisations. Another danger is that the technology opens the possibility of spying on employees. For example, a manager who wants to know what an employee is up to can ask the LLM about which documents an employee has been working on.
Granted, highlighting the above might make some corporate customers wary of rushing in to implement LLM technologies within their organisations. However, I would argue that this is a good thing for vendors in the long run, as it demonstrates a commitment to implementing AI ethically.
–x–
It is appropriate to end this piece by making a final point via another historical note.
The breakthrough that led to the development LLMs was first reported in a highly cited 2017 paper entitled. “Attention is all you need”. The paper describes an architecture (called transformer) that enables neural networks to accurately learn the multiple contexts in which words occur in a large volume of text. If the volume of text is large enough – say a representative chunk of the internet – then a big enough neural network with billions of nodes can be trained to encode the entire vocabulary of the English language in all possible contexts.
The authors’ choice of the “attention” metaphor is inspired because it suggests that the network “learns to attend to” what is important. In the context of humans, however, the word “attention” means much more than just attending to what is important. It also refers to the deep sense of engagement with what we are attending to. The machines we use should help us deepen that engagement, not reduce (let alone eliminate) it. And therein lies the ethical challenge for AI vendors.
–x–x–





