Eight to Late

Sensemaking and Analytics for Organizations

Posts Tagged ‘chatgpt

Towards a public understanding of AI  – notes from an introductory class for a general audience

with 4 comments

There is a dearth of courses for the general public on AI.  To address this gap, I ran a 3-hour course entitled “AI – A General Introduction” on November 4th at the Workers Education Association (WEA) office in Sydney. My intention was to introduce attendees to modern AI via a historical and cultural route, from its mystic 13th century origins to the present day. In this article I describe the content of the session, offering some pointers for those who would like to run similar courses.

–x–

First some general remarks about the session:

  • The course was sold out well ahead of time – there is clearly demand for such offerings.
  • Although many attendees had used ChatGPT or other similar products, none had much of an idea of how they worked or what they are good (and not so good) for.
  • The session was very interactive with lots of questions. Attendees were particularly concerned about social implications of AI (job losses, medical applications, regulation). 
  • Most attendees did not have a technical background. However, judging from their questions and feedback, it seems that most of them were able to follow the material.  

The biggest challenge in  designing a course for the public is the question of prior knowledge – what one can reasonably assume the audience already knows. When asked this question, my contact at WEA said I could safely assume that most attendees would have a reasonable familiarity with western literature, history and even some philosophy, but would be less au fait with science and tech. Based on this, I decided to frame my course as a historical / cultural introduction to AI.

–x–

The screenshot below, taken from my slide pack, shows the outline of the course

Here is a walkthrough the agenda, with notes and  links to sources:

  • I start with a brief introduction to machine learning, drawing largely on the first section of this post.
  • The historical sections, which are scattered across the agenda (see figure above), are  based on this six part series on the history of natural language processing (NLP) supplemented by material from other sources.
  • The history starts with the Jewish mystic Abraham Abulafia and his quest for divine incantations by combining various sacred words. I also introduce the myth of the golem as a metaphor for the double-edgedness of technology,  mentioning that I will revisit it at the end of the lecture in the context of AI ethics.
  • To introduce the idea of text generation, I use the naïve approach of generating text by randomly combining symbols (alphabet, basic punctuation and spaces), and then demonstrate the futility of this approach using Jorge Luis Borges famous short story, The Library of Babel.
  • This naturally leads to more principled approaches based on the actual occurrence of words in text. Here I discuss the pioneering work of Markov and Shannon.
  • I then use a “Shakespeare Quote Game” to illustrate the ideas behind next word completion using n-grams. I start with the first word of a Shakespeare quote and ask the audience to guess the quote. With just one word, it is difficult to figure out. However, as I give them more and more consecutive words, the quote becomes more apparent.
  • I then illustrate the limitations of n-grams by pointing out the problem of meaning – specifically, the issues associated with synonymy and polysemy. This naturally leads on to Word2Vec, which I describe by illustrating the key idea of how rolling context windows  over large corpuses leads to a resolution of the problem. The basic idea here is that instead of defining the meaning of words, we simply consider multiple contexts  in which they occur. If done over large and diverse corpuses, this approach can lead to an implicit “understanding” of the meanings of words.
  • I also explain the ideas behind word embeddings and vectors using a simple 3d example. Then using vectors, I provide illustrative examples of how the word2vec algorithm captures grammatical and semantic relationships (e.g., country/capital and comparative-superlative forms).
  • From word2vec, I segue to transformers and Large Language Models (LLMs) using a hand-wavy discussion. The point here is not so much the algorithm, but how it is (in many ways) a logical extension of the ideas used in early word embedding models. Here I draw heavily on Stephen Wolfram’s excellent essay on ChatGPT.
  • I then provide some examples of “good” uses of LLMs – specifically for ideation, explanation and a couple of examples of how I use it in my teaching.
  • Any AI course pitched to a general audience must deal with the question of intelligence – what it is, how to determine whether a machine is intelligent etc. To do this, I draw on Alan Turing’s 1950 paper in which he proposes his eponymous test. I illustrate how well LLMs do on the test  and ask the question – does this mean LLMs are intelligent?  This provokes some debate.
  • Instead of answering the question, I suggest that it might be more worthwhile to see if LLMs can display behaviours that one would expect from intelligent beings – for example, the ability to combine different skills or to solve puzzles / math problems using reasoning (see this post for examples of these). Before dealing with the latter, I provide a brief introduction to deductive, inductive and abductive reasoning using examples from popular literature.
  • I leave the question of whether LLMs can reason or not unanswered, but I point to a number of different approaches that are popularly used to demonstrate reasoning capabilities of LLMs – e.g. chain of thought prompting.  I also make the point that the response obtained from LLMs is heavily dependent on the prompt, and that in many ways, an LLM’s  response is a reflection of the intelligence of the user, a point made very nicely by Terrence Sejnowski in this paper.
  • The question of whether LLMs can reason is also related to whether we humans think in language. I point out that recent research suggests that we use language to communicate our thinking, but not to think (more on this point here). This poses an interesting question  which Cormac McCarthy explored in this article.
  • I follow this with a short history of chatbots starting from Eliza to Tay and then to ChatGPT. This material is a bit out of sequence, but I could not find a better place to put it. My discussion draws on the IEEE history mentioned at the start of this section complemented with other sources.
  • For completeness, I discuss a couple of specialised AIs – specifically AlphaFold and AlphaGo. My discussion of AlphaFold starts with a brief explanation of the protein folding problem and its significance. I then explain how AlphaFold solved this problem. For AlphaGo, I briefly describe the game of Go, why its considered far more challenging than chess. This is followed by high level explanation of how AlphaGo works. I close this section with a recommendation to watch the brilliant AlphaGo documentary.
  • To close, I loop back to where I started – the age old human fascination with creating machines in our own image. The golem myth has a long history which some scholars have traced back to the bible. Today we are a step closer to achieving what medieval mystics yearned for, but it seems we don’t quite know how to deal with the consequences.

I sign off with some open questions around ethics and regulation – areas that are still wide open – as well as some speculations on what comes next, emphasising that I have no crystal ball!

–x–

A couple of points to close this piece.

There was a particularly interesting question from an attendee about whether AI can help solve some of the thorny dilemmas humankind faces today.  My response was that these challenges are socially complex  – i.e.,  different groups see these problems differently –  so they lack a commonly agreed framing.  Consequently, AI is unlikely to be of much help in addressing these challenges. For more on dealing with socially complex or ambiguous problems, see this post and the paper it is based on.  

Finally, I’m sure some readers may be wondering how well this approach works. Here are some excerpts from emails I received from  few of the attendees:

What a fascinating three hours with you that was last week. I very much look forward to attending your future talks.”

I would like to particularly thank you for the AI course that you presented. I found it very informative and very well presented. Look forward to hearing about any further courses in the future.”

Thank you very much for a stimulating – albeit whirlwind – introduction to the dimensions of artificial intelligence.”

I very much enjoyed the course. Well worthwhile for me.”

So, yes, I think it worked well and am looking forward to offering it again in 2025.

–x–x–

Copyright Notice: the course design described in this article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/)


Written by K

November 19, 2024 at 4:00 am

Can Large Language Models reason?

with 8 comments

There is much debate about whether Large Language Models (LLMs) have reasoning capabilities: on the one hand, vendors and some researchers claim LLMs can reason; on the other, there are others who contest these claims. I have discussed several examples of the latter in an earlier article, so I won’t rehash them here. However, the matter is far from settled: the debate will go on because new generations of LLMs will continue to get better at (apparent?) reasoning. 

It seems to me that a better way to shed light on this issue would be to ask a broader question: what purpose does language serve?

More to the point: do humans use language to think or do they use it to communicate their thinking

Recent research suggests that language is primarily a tool for communication, not thought (the paper is paywalled, but a summary is available here). Here’s what one of the authors says about this issue:

““Pretty much everything we’ve tested so far, we don’t see any evidence of the engagement of the language mechanisms [in thinking]…Your language system is basically silent when you do all sorts of thinking.”

This is consistent with studies on people who have lost the ability to process words and frame sentences, due to injury. Many of them are still able to do complex reasoning tasks such as play chess or solve puzzles, even though they cannot describe their reasoning in words. Conversely, the researchers find that intellectual disabilities do not necessarily impair the ability to communicate in words.

–x–

The notion that language is required for communicating but not for thinking is far from new. In an essay published in 2017, Cormac McCarthy noted that:

Problems in general are often well posed in terms of language and language remains a handy tool for explaining them. But the actual process of thinking—in any discipline—is largely an unconscious affair. Language can be used to sum up some point at which one has arrived—a sort of milepost—so as to gain a fresh starting point. But if you believe that you actually use language in the solving of problems I wish that you would write to me and tell me how you go about it.”

So how and why did language arise? In his epic book, The Symbolic Species, the evolutionary biologist and anthropologist Terrence Deacon suggests that it arose out of the necessity to establish and communicate social norms around behaviours, rights and responsibilities as humans began to band into groups about two million years ago. The problem of coordinating work and ensuring that individuals do not behave in disruptive ways in large groups requires a means of communicating with each other about specific instances of these norms (e.g., establishing a relationship, claiming ownership) and, more importantly, resolving disputes around perceived violations.   Deacon’s contention is that language naturally arose out of the need to do this.

Starting with C. S. Pierce’s triad of icons, indexes and symbols, Deacon delves into how humans could have developed the ability to communicate symbolically.  Symbolic communication is based on the powerful idea that a symbol can stand for something else – e.g., the word “cat” is not a cat, but stands for the (class of) cats.  Deacon’s explanation of how humans developed this capability is – in my opinion – quite convincing, but is by no means widely accepted. As echoed by McCarthy in his essay, the mystery remains:

At some point the mind must grammaticize facts and convert them to narratives. The facts of the world do not for the most part come in narrative form. We have to do that.

So what are we saying here? That some unknown thinker sat up one night in his cave and said: Wow. One thing can be another thing. Yes. Of course that’s what we are saying. Except that he didn’t say it because there was no language for him to say it in [yet]….The simple understanding that one thing can be another thing is at the root of all things of our doing. From using colored pebbles for the trading of goats to art and language and on to using symbolic marks to represent pieces of the world too small to see.”

So, how language originated is still an open question. However, once it takes root, it is easily adapted to purposes other than the social imperatives it was invented for. It is a short evolutionary step from rudimentary communication about social norms in hunter-gatherer groups to Shakespeare and Darwin.

Regardless of its origins, however, it seems clear that language is a vehicle for communicating our thinking, but not for thinking or reasoning.

–x–

So, back to LLMs then:

Based, as they are, on a representative corpus of human language, LLMs mimic how humans communicate their thinking, not how humans think. Yes, they can do useful things, even amazing things, but my guess is that these will turn out to have explanations other than intelligence and / or reasoning. For example, in this paper, Ben Prystawksi and his colleagues conclude that “we can expect Chain of Thought reasoning to help when a model is tasked with making inferences that span different topics or concepts that do not co-occur often in its training data, but can be connected through topics or concepts that do.” This is very different from human reasoning which is a) embodied, and thus uses data that is tightly coupled – i.e., relevant to the problem at hand and b) uses the power of abstraction (e.g. theoretical models).

In time, research aimed at understanding the differences between LLM and human reasoning will help clarify how LLMs do what they do. I suspect it will turn out that that LLMs do sophisticated pattern matching and linking at a scale that humans simply cannot and, therefore, give the impression of being able to think or reason.

Of course, it is possible I’ll turn out to be wrong, but while the jury is out, we should avoid conflating communication about thinking with thinking itself.

–x–x–

Postscript:

I asked Bing Image Creator to generate an image for this post. The first prompt I gave it was:

An LLM thinking

It responded with the following image:

I was flummoxed at first, but then I realised it had interpreted LLM as “Master of Laws” degree. Obviously, I hadn’t communicated my thinking clearly, which is kind of ironic given the topic of this article. Anyway, I tried again, with the following prompt:

A Large Language Model thinking

To which it responded with a much more appropriate image:

Written by K

July 16, 2024 at 5:51 am