More than stochastic parrots: understanding and reasoning in LLMs
The public release of the ChatGPT last year has spawned a flood of articles and books on how best to “leverage” the new technology. Majority of these provide little or no explanation of how Large Language Models (LLMs), such as ChatGPT, do what they do.
In this article, I discuss some of the surprising things that LLMs can do, with the intent of developing a rudimentary sense for how they “understand” meaning. I begin with Alan Turing’s classic paper, which will lead us on to a few demos of ChatGPT capabilities that might be considered unusual because they combine two or more very different skills. Following that I will touch upon some recent research on how LLMs are able to develop an understanding of meaning.
A caveat before we proceed: the question of how LLMs develop representations of concepts and meaning is still not well understood, therefore much of what I say in the latter part of this piece is tentative. Nevertheless, I offer it here for two reasons. Firstly, to show that it is not hard to understand the essence, if not the detail, of what is going on at the frontier of this exciting field. Secondly, that the mental models that come from such an understanding can help you use these tools in interesting and creative ways.
–x–
In his classic paper, Turing posed the question: can machines think?
Noting that the question, as posed, invites endless arguments about what constitutes a “machine” or what it means to “think,” he reframed it in terms of what he called an imitation game:
“[the game] is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart [in] front [of] the other two [and cannot see either]. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either “X is A and Y is B” or “X is B and Y is A.” The interrogator is allowed to put questions to A and B thus:
C: Will X please tell me the length of his or her hair?
Now suppose X is actually A, then A must answer. It is A’s object in the game to try and cause C to make the wrong identification. His answer might therefore be:
“My hair is shingled, and the longest strands are about nine inches long.”
In order that tones of voice may not help the interrogator the answers should be written,or better still, typewritten.
…The object of the game for the third player (B) is to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as “I am the woman, don’t listen to him!” to her answers, but it will avail nothing as the man can make similar remarks.
We now ask the question, “What will happen when a machine takes the part of A in this game?” Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, “Can machines think?”
This is the basis of the well-known Turing Test.
In the paper, which is well worth a read, Turing proposes some specimen questions that one might ask the machine, along with responses from a hypothetical thinking machine:
Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 seconds and then give as answer) 105621.
….
Declining a difficult language task and making an error in simple arithmetic (was that intentional??) are very human reactions. Nevertheless, I think readers would agree with Turing about the suitability of the Q&A format as a probe of knowledge and reasoning capabilities of machines.
–x–
I thought it would be interesting to pose Turing’s questions to ChatGPT (v3.5). Here is how it responded to the first question:
User
Please write me a sonnet on the subject of the Forth bridge.
ChatGPT
Amidst the rolling waves, a structure bold,
The Forth Bridge emerges, a sight to hold.
Its iron arms outstretched, a feat of might,
Connecting lands, spanning the waters’ fight.
A symphony of steel, a dance of grace,
A testament to human skill and space.
High above the tides, it proudly stands,
A link between two shores, crafted by hands.
Against the winds that howl with fierce delight,
It stands unwavering through day and night.
Its crimson hue against the Scottish sky,
A beacon of connection soaring high.
Oh, Forth Bridge, a marvel of design,
A timeless bond ‘twixt land and sea’s confines.
That is surprisingly good! Unnaturally so, because it spat out a response in a few seconds, a far shorter timeframe than one in which human could produce a sonnet on demand.
I also learnt something that I did not know – that the bridge is red in colour. That gave me an angle to query ChatGPT further:
User
In the eleventh line of your sonnet which reads “Its crimson hue against the Scottish sky,” would not “a scarlet hue” do as well or better?
ChatGPT
Certainly, “a scarlet hue” would indeed work well in that line. Here’s the revised version of the sonnet:
…and it printed out the entire sonnet again with “crimson” replaced with “scarlet.” It is interesting that ChatGPT did not argue the point about the suitability of the hue. It appears to have been trained to be agreeable, something that others have commented on.
I pushed the point further, asking for its opinion on scarlet vs crimson:
User
Do you think that works better than the original version?
ChatGPT
Both versions have their own merits, and the choice between “crimson hue” and “scarlet hue” is largely a matter of personal preference and the imagery you want to convey. “Scarlet hue” does indeed evoke a vivid and intense image, emphasizing the boldness of the bridge’s color against the sky. On the other hand, “crimson hue” also paints a vibrant and rich picture, aligning with traditional descriptions of the bridge’s color. Ultimately, both options are valid, and the decision depends on the tone and imagery you want to convey in your sonnet.
That’s pretty good: it gave me a reason for its choice, but also left me feeling that my suggestion is not entirely stupid. Very agreeable indeed!
From the above, one might say that ChatGPT3.5 aces Turing’s question. Ironically, what gives it away as a non-human is a) its speed, b) its extreme agreeability and b) the comprehensiveness of its responses. No human would respond so quickly, politely, and comprehensively to such a difficult query.
Note that ChatGPT4 does much better than the 3.x version on most tasks, so you may want to try the above if you have access to the 4.x version.
–x–
At this point you may be asking: what is ChatGPT doing…and why does it work? This question is partially answered in a highly readable article by Stephen Wolfram (also available as a book). I’ll leave you to read that article later, but I’d like to pick up on a point he makes in it: LLMs excel in linguistic tasks (like composing poetry) but are less proficient in dealing with questions that involve complex reasoning (such as logic puzzles or mathematical reasoning). That is true, but it should be noted that proficiency in the former also implies some facility with the latter, and I’ll make this point via a couple of examples towards the end of this piece.
LLMs’ proficiency in language makes sense: these models are trained on a huge quantity of diverse texts – ranging from fiction to poetry to entire encyclopedias – so they know about different styles of writing and can draw upon a significant portion of human knowledge when synthesizing responses.
However, can one claim that they have “learnt” the meaning of what they “talk” about?
In an award-winning paper published in 2020, Emily Bender and Timothy Koller argued that “system[s] trained only on form [have] a priori no way to learn meaning” – in other words, models that are trained on vast quantities of text cannot have any “understanding” of meaning. In the conclusion they note that, “…large language models such as BERT do not learn “meaning”; they learn some reflection of meaning into the linguistic form which is very useful in applications.”
In a follow up paper, in which the now-famous term stochastic parrot was coined, Bender and her colleagues made the important point that, “Text generated by an LLM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that.” This is an important point to keep in mind as one converses with ChatGPT – that it cannot read intent as a human might be able to, all it can go by is the prompt it is given.
A few lines later, they note that, “Contrary to how it may seem when we observe its output, a [Language Model] is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.”
However, the responses that I got from ChatGPT seemed more coherent that “haphazardly stitched together” sequences of forms it has observed in the past. It seems like it is working off some kind of conceptual model.
Which begs the question: how does it do that?
–x–
To shed some light on this question, I will now discuss two papers presented at a recent (August 2023) meeting on LLMs held at UC Berkeley.
In a talk presented at the meeting, Steven Piantadosi argued that LLMs “understand” the meaning of words the same way that humans associate meanings with them: through the different concepts that words are used to express. This is called conceptual role semantics, and is based on the following two claims:
- The meanings of linguistic expressions are determined by the contents of the concepts and thoughts they can be used to express.
- The contents of concepts and thoughts are determined by their functional role in a person’s psychology.
Piantadosi’s argument (as I understand it) is that LLMs develop internal representations of the varied meanings of words and phrases in a manner analogous to humans – i.e., through repeated exposure to words / phrases used to express (and reason about) different concepts. To be sure, the representations in silico are very different from those in our wetware. However, the claim that the models might learn such representations in an emergent manner, as they are exposed to more and more text, is not unreasonable.
Those interested in exploring this idea further may want to check out Piantadosi’s paper.
At the same meeting, Sanjeev Arora gave a talk on a theory of skill emergence in LLMs. I did not follow the details of his arguments, so I won’t comment on them here. However, the point he made is that LLMs learn to combine multiple skills in ways that they are unlikely to have been exposed to in training – and that they are able to do so because the vast corpus they have been trained on contains many examples of multiple mixed skills (such as parsing questions, essay writing, reasoning etc). Through continued exposure to several diverse examples, it “learns” how to combine such skills inductively. If this does not sound plausible, recall that transfer learning is essentially a demonstration of the kind of inductive reasoning that Arora refers to.
A surprising prediction of Arora’s theory is that proficiency in different combinations of multiple skills arises in the same manner that proficiencies in single skills are developed. As he notes in the conclusion of the paper on which his talk was based:
“We have proposed a theoretical framework for understanding emergence of skills when language models are scaled up. A key insight is that reduction in excess cross entropy loss drives skill acquisition, together with the assumption that normal language —down to short paragraph level— already utilizes multiple skills, mixed up randomly. Need for mechanistic insight is sidestepped using Scaling Law, which quantifies a powerful inductive bias in pre-trained models…a concrete example of this inductive bias is that in our framework proficiency in combinations of skills arises just as naturally as proficiency in the individual skills themselves, and need not require seeing examples of all (or even most) of these combinations in the training set. This has relevance to the ongoing debate about the extent of “understanding” that current models have, and their ability to address novel settings.”
The emergence of proficiency in multiple skills is interesting, and sort of “explains” ChatGPT’s surprisingly good Forth Bridge Sonnet. So, I thought it would be interesting to end this piece with a test that combines yet another set of skills, which I hope you will find even more surprising: composing verse and mathematical reasoning.
–x–
User
Can you give me a proof that the square root of two is irrational in sonnet form?
ChatGPT
Amidst the realm of numbers, set and firm,
Lies sqrt(2), a mystery to discern.
Suppose, with hope, a fraction could express,
Its form a/b, where a and b impress.
In seeking simplest terms, we strive to find,
GCD, their common divisor kind.
Yet let us dream that a and b stand free,
No common factors bind, with certainty.
Now, let’s assume it’s true, and so proceed,
With algebraic steps, our wits to feed.
Squared a/b, and two, a match we chase,
Equation thus: a^2 = 2 * b^2 in space.
Now a must be even, so we glean,
For if odd, a^2, odd too, is seen.
Then b must follow suit, an even mate,
Else b^2 odd, a contradiction’s gate.
But wait, contradiction further unfurls,
2 * b^2 even, as math whirls.
Thus a^2 even, a even, we derive,
Contradicting a’s oddness, math does strive.
This contradiction shows our first intent,
A fraction’s claim, by logic, is rent.
The square root of two, irrational true,
Its proof in sonnet form, now stands to view.
Which is a reasonably accurate, though admittedly confusing, description of Euclid’s proof of the irrationality of the square root of 2 in (not so good!) verse.
–x–
Finally, to close the loop, I asked ChatGPT a couple of math problems, starting with Turing’s elementary arithmetic question:
User
Add 34957 to 70764.
ChatGPT
The sum of 34957 and 70764 is 105721.
Unsurprisingly, it gets that right.
What about more complex mathematical reasoning? I pulled out a word problem from a Year 6 maths book:
User
Zach buys two hot dogs and three drinks for $14. Drinks cost $2 each. How much does one hot dog cost?
ChatGPT
Let’s denote the cost of one hot dog as “H” dollars.
Given that Zach buys two hot dogs and three drinks for $14, we can set up the equation:2H + 3(2) = 14Simplify the equation:
2H + 6 = 14
Subtract 6 from both sides:
2H = 8
Now, divide both sides by 2:
H = 4
Therefore, one hot dog costs $4.
That blew me away. Not only did it get the answer right, it presented the reasoning behind it.
Understanding reasoning capabilities of LLMs is a new but very active area of research and I’ll just point readers to two recent papers in this area:
In this paper, Nikunj Saunshi and Vivek Gaur explore the capabilities of LLM to do symbolic reasoning – i.e., to convert math word problems to equations and solve them, much as in the example above.
In this mammoth paper (which I’m yet to go through in detail), Sebastien Bubeck and his colleagues investigate GPT-4’s ability to solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting.
I should mention that we have little knowledge at present as to how LLMs reason. However, in view of the examples above (and many more in the referenced papers), it seems reasonable to conclude that state-of-the-art LLMs are more than stochastic parrots.
–x–
In this article I have touched upon the question of how these models “understand meaning,” starting from Turing’s original imitation game to a couple of examples drawn from current research. I think he would have been impressed by the capabilities of current LLMs…and keep in mind that state-of-the-art models are even better than the ones I have used in my demos.
It seems appropriate to close this piece with some words on the implications of machines that “understand meaning.” The rapid improvements in LLM capabilities to comprehend is disconcerting, and provokes existential questions such as: how much better (than us) might these tools become? what implications does that have for us? Or, closer home: what does it mean for me and my job? All these are excellent questions that have no good answers at the moment. The only thing that is clear is the genie is out in the wild and our world has forever changed. How can we thrive in the new world where machines understand meaning better than we do? I hope to explore this question in future articles by sketching out some thoughts on how these tools can be used in ways that enhance rather than diminish our own capabilities.
Sherlock Holmes and the case of the Agile rituals
As readers of these chronicles will know, the ebb in demand for the services of private detectives had forced Mr. Holmes to deploy his considerable intellectual powers in the service of large multinationals. Yet, despite many successes, he remained ambivalent about the change.
As he once remarked to me, “the criminal mind is easier to analyse than that of the average executive: the former displays a clarity of thought that tends to be largely absent in the latter.”
Harsh words, perhaps, but judging from the muddle-headed thinking I have witnessed in the corporate world, I think his assessment is fair.
The matter of the Agile rituals is a case in point.
–x–
It was a crisp autumn morning. I had risen at the crack of dawn and gone for my customary constitutional. As I opened the door to 221B on my return, I was greeted by the unmistakable aroma of cooked eggs and bacon. Holmes, who is usually very late in the mornings, save upon those not infrequent occasions when he is up all night, was seated at the table, tucking into an English Breakfast.
“Should you be eating that?” I queried, as I hung my coat on the rack.
“You underestimate the Classic English Breakfast,” he said, waving a fork in my direction. “Yes, its nutritional benefits are overstated, but the true value of the repast lies in the ritual of cooking and consuming it.”
“The ritual?? I was under the impression that rituals have more to do with the church than the kitchen.”
“Ah, Watson, you are mistaken. Practically every aspect of human life has its rituals, whether it be dressing up for work or even doing work. Any activity that follows a set sequence of steps can become a ritual – by which I mean something that can be done without conscious thought.”
“But, is not that an invitation for disaster? If one does not think about what one is doing, one will almost certainly make errors.”
“Precisely,” he replied, “and that is the paradox of ritualisation. There are certain activities that are safe to ritualise, so to speak. Preparing and partaking this spread, for example. It offers the comfort of doing something familiar and enjoyable – and there is no downside, barring a clogged capillary or two. The problem is that there are other activities that, when ritualised, turn out to be downright dangerous.”
“For example?” I was intrigued.
He smiled. “For that I shall ask you to wait until we go to BigCorp’s offices later today.”
“BigCorp??”
“So many questions, Watson. Come with me and all will be clear.”
–x–
We were seated in Jarvis’ office. He was BigCorp’s Head of Technology.
“Mr. Holmes, I was surprised to get your call this morning,” said Jarvis. “You were on site this Monday so I thought it would be at least a few more weeks before we heard from you. Doesn’t it take time to analyse all the information you collected from our teams? From what I was told you took away reams of transcripts and project plans.”
“Not all data is information, most of it is noise,” replied Holmes. “It is of the highest importance, therefore, not to have useless facts elbowing out the useful ones.”
“I see,” said Jarvis. The look on his face said he clearly didn’t.
“Let me get straight to the point,” said Holmes. “BigCorp implemented XXXX Agile framework across the organisation a year ago with the expectation it would improve customer satisfaction with projects delivered. In particular, the intent was to follow Agile Principles. So, you adapted and implemented Agile practices from XXXX that would enable you to operationalise the principles. Would that be a fair summary?” (Editor’s note: the name of the framework has been redacted at Mr. Holmes’ request)
“Yes, that is correct,” nodded Jarvis.
“My conversations with your staff make it clear that the practices and processes have indeed been adapted and implemented. And this has been confirmed by the many project documents I checked,” said Holmes.
“So, where’s the problem then?” queried Jarvis.
“Adopting a practice or process is no guarantee that it will be implemented correctly.” said Holmes.
“I’m not sure I understand.”
“It seems that your staff follow Agile practices ritualistically with no thought about the intent behind them. For example, stand up meetings are treated as forums to enforce specific points of view rather than debate them. Instead of surfacing issues and dealing with them in a way that works for all parties, the meetings invariably end up with winners and losers. This is totally counter to the Agile philosophy. As an observer, it seemed to me that there was no sense of ‘being in it together’ or wanting to get the best outcome for the customer.”
“How many meetings did you attend, Mr. Holmes?””
“Three.”
“Surely that is too small a sample to generalise from,” said Jarvis.
“Not if you know what to look for. You know my method. It is founded upon the observation of trifles. Things such as attitude, tone of voice, engagement, empathy etc. Believe me, I saw enough to tell me what is wrong. Your people have implemented practices but not the intent behind them.”
“I’m not sure I understand.”
“OK, an example might help. One of the Agile principles states that changing requirements are welcomed, even late in development. In one of the meetings I attended, the PM shut down a discussion about changes that the customer wanted by offering technical excuses. The conversation was used to enforce a viewpoint that is centred on BigCorp’s interests rather than those of the customer. Is that not counter to Agile principles? Surely, even if the customer is asking for something unreasonable, it is incumbent on your team to work towards mutual agreement rather than shutting them down summarily.”
“Hmm, I see. So, what do you recommend, Mr. Holmes?”
“You are not going to like what I say, Mr. Jarvis, but the fault lies with you and your management team. You have created an environment that is not conducive to the mindset and dispositions required to be truly Agile. As a result, what you have are its rituals, followed mindlessly. Only you can change that.”
“How?”
“By creating an environment that encourages your staff to develop an Agile mindset without fear of failure. I can recommend a reference or two.”
“That would be helpful Mr. Holmes,” said Jarvis.
Holmes elaborated on the reference and what Jarvis and his team needed to do. The conversation then moved on to other matters that are not relevant to my tale.
–x–
“That was excellent, Holmes,” I remarked, as we made our way out of the BigCorp Office.
“No, it’s elementary,” he replied with a smile, “it is simply that many practitioners prefer not to think about what it means to be Agile. Blindly enacting its rituals is so much easier.”
–x–
Notes:
- There are three quotes taken from Sherlock Holmes stories in the above piece. Can you spot them? (Hint: they are not in the last section.)
- See this post for more on rituals in information system design and development.
- And finally, for a detailed discussion of an approach that privileges intent over process, check out my book on data science strategy.
What is Emergent Design?
Last week I had the opportunity to talk to a data science team about the problems associated with building a modern data capability in a corporate environment. The organisation’s Head of Data was particularly keen to hear about how the Emergent Design approach proposed in my recent book might help with some of the challenges they are facing. To help me frame my talk, he sent me a bunch of questions, which included the following two:
- Can you walk us through the basic steps of emergent design in data science?
- How does emergent design work with other data science methodologies, such as CRISP-DM or Agile?
On reading these questions, I realized I had a challenge on my hands: Emergent Design is about cultivating a mindset or disposition with which to approach problematic situations, rather than implementing a canned strategic framework or design methodology. Comparing it to other methodologies would be a category error – like comparing pizza to philosophy. (Incidentally, I think this is the same error that some (many?) Agile practitioners make when they mistake the rituals of Agile for the mindset required to do it right…but that’s another story.)
The thing is this: the initial task of strategy or design work is about a) understanding what ought to be done, considering the context of the organisation and b) constructing an appropriate approach to doing it. In other words, it is about framing the (strategic or any other) problem and then figuring out what to do about it, all the while keeping in mind the specific context of the organization. Emergent Design is a principled approach to doing this.
–x–
So, what is Emergent Design?
A good place to start is with the following passage from the PhD thesis of David Cavallo, the man who coined the term and developed the ideas behind it:
“The central thrust of this thesis is the presentation of a new strategy for educational intervention. The approach I describe here resembles that of architecture, not only in the diversity of the sources of knowledge it uses but in another aspect as well – the practice of letting the design emerge from an interaction with the client. The outcome is determined by the interplay between understanding the goals of the client; the expertise, experience, and aesthetics of the architect; and the environmental and situational constraints of the design space. Unlike architecture where the outcome is complete with the artifact, the design of [such initiatives] interventions is strengthened when it is applied iteratively. The basis for action and outcome is through the construction of understanding by the participants. I call this process Emergent Design.”
Applied to the context of building a data capability, Emergent Design involves:
- Having conversations across the organization to understand the kinds of problems people are grappling with. The problems may or may not involve data. The ones that do not involve data give you valuable information about the things people worry about. In general the conversations will cover a lot of territory and that is OK – the aim is to get a general feel for the issues that matter.
- Following the above, framing a handful – two or three – concrete problems that you can solve using skills and resources available on hand. These proof-of-concept projects will help you gain allies and supporters for your broader strategic efforts.
While doing the above, you will notice that people across the organization have wildly different perspectives on what needs to be done and how. Moreover, they will have varying perspectives on what technologies should be used. As a technology strategist, your key challenge is around how to reconcile and synthesise these varied viewpoints. This is what makes the problem of strategy a wicked problem. For more on the wicked elements of building data capabilities, check out the first chapter of my book which is available for free here.
Put simply then, Emergent Design is about:
- Determining a direction rather than constructing a roadmap
- Letting the immediate next steps be determined by what adds the greatest value (for your stakeholders)
With that said for the “what,” I will now say a few words about the “how.”
–x–
In the book we set out eight guidelines or principles for practicing Emergent Design. I describe them briefly below:
Be a midwife rather than an expert: In practical terms, this involves having conversations with key people in business units across the organisation, with the aim of understanding their pressing problems and how data might help in solving them (we elaborate on this at length in Chapter 4 of the book). The objective at this early stage is to find out what kind of data science function your organisation needs. Rather than deep expertise in data science, this requires an ability to listen to experts in other fields, and translate what they say into meaningful problems that can potentially be solved by data science. In other words, this requires the strategist to be a midwife rather than an expert.
Use conversations to gain commitment: In their ground- breaking book on computers and cognition, Winograd and Flores observed that “organisations are networks of commitments.” between people who comprise the organisation. It is through conversations that commitments between different groups of stakeholders are established and subsequently acted on. In Chapter 3 of the book, we offer some tips on how to have such conversations. The basic idea in the above is to encourage people to say what they really think, rather than what they think you want them to say. It is crucial to keep in mind that people may be unwilling to engage with you because they do not understand the implications of the proposed changes and are fearful of what it might mean for them.
Understand and address concerns of stakeholders who are wary of the proposed change: In our book, The Heretic’s Guide to Management, Paul Culmsee and I offer advice on how to do this in specific contexts using what we call “management teddy bears”. These involve offering reassurance, advice, or opportunities that reduce anxiety, very much akin to how one might calm anxious children by offering them teddy bears or security blankets. Here are a few examples of such teddy bears:
- A common fear that people have is that the new capability might reduce the importance of their current roles. A good way to handle this is to offer these people a clear and workable path to be a part of the change. For example, one could demonstrate how the new capability (a) enriches their current role or (b) offers opportunities to learn new skills or (c) enhances their effectiveness. We could call this the “co- opt teddy bear”. In Chapter 7 of our data science strategy book, we offer concrete ways to involve the business in data science projects in ways that makes the projects theirs.
- It may also happen that some stakeholder groups are opposed to the change for political reasons. In this case, one can buy time by playing down the significance of the new capability. For example, one could frame the initiative as a “pilot” project run by the current data and reporting function. We could call this the “pilot teddy bear.” See the case study towards the end of Chapter 3 of the data science strategy book for an example of a situation in which I used this teddy bear.
Frame the current situation as an enabling constraint: In strategy development, it is usual to think of the current situation in negative terms, a situation that is undesirable and one that must be changed as soon as practicable. However, one can flip this around and look at the situation from the perspective of finding specific things that you can change with minimal political or financial cost. In other words, you reframe the current situation as an enabling constraint (see this paper by Kauffman and Garre for more on this). The current situation is well defined, but there are an infinite number of possible next steps. Although the actual next step cannot be predicted, one can make a good enough next step by thinking about the current situation creatively in order to explore what Kauffman calls the adjacent possible – the possible future states that are within reach, given the current state of the organisation (see this paper by Kauffman). You may have to test a few of the adjacent possible states before you figure out which one is the best. This is best done via small, safe- to- fail proof of concept projects (discussed at length in Chapter 4 of our data science strategy book).
Consider long- term and hidden consequences: It is a fact of life that when choosing between different approaches, people will tend to focus on short- term gains rather than long- term consequences. Indeed, one does not have to look far to see examples that have global implications (e.g., the financial crisis of 2008 and climate change). Valuing long- term results is difficult because the distant future is less salient than the present or the immediate future. A good way to look beyond immediate concerns (such as cost) is to use the “solution after next principle” proposed by Gerald Nadler and Shozo Hibino in their book entitled Breakthrough Thinking. The basic idea behind the principle is to get people to focus on the goals that lie beyond the immediate goal. The process of thinking about and articulating longer term goals can often provide insights into potential problems with the current goals and/ or how they are being achieved. We discuss this principle and another approach to surfacing hidden issues in Chapters 3 and 4 of the data science strategy book.
Create an environment that encourages learning: Emergent Design is a process of experimentation and learning. However, all learning other than that of the most trivial kind involves the possibility of error. So, for it to work, one needs to create an environment of psychological safety – i.e., an environment in which employees feel safe to take risks by trialling new ideas and processes, with the possibility of failure. A key feature of learning organisations is that when things go wrong, the focus is not on fixing blame but on fixing the underlying issue and, more importantly, learning from it so that one reduces the chances of recurrence. It is interesting to note that this focus on the system rather than the individual is also a feature of high reliability organisations such as emergency response agencies.
Beware of platitudinous goals: Strategies are often littered with buzzwords and platitudes – cliched phrases that sound impressive but are devoid of meaning. For example, two in- vogue platitudes at the time our book was being written are “digital transformation” and “artificial intelligence.” They are platitudes because they tell you little about what exactly they mean in the specific context of the organisation.
The best way to deconstruct a platitude is via an oblique approach that is best illustrated through an example. Say someone tells you that they want to implement “artificial intelligence” (or achieve a “digital transformation” or any other platitude!) in their organisation. How would you go about finding out what exactly they want? Asking them what they mean by “artificial intelligence” is not likely to be helpful because the answer you will get is likely to be couched in generalities such as data- driven decision making or automation, phrases that are world- class platitudes in their own right! Instead, it is better to ask them how artificial intelligence would make a difference to the organisation. This can help you steer the discussion towards a concrete business problem, thereby bringing the conversation down from platitude- land to concrete, measurable outcomes.
Act so as to increase your choices: This is perhaps the most important point in this list because it encapsulates all the other points. We have adapted it from Heinz von Foerster’s ethical imperative which states that one should always act so as to increase the number of choices in the future (see this lecture by von Foerster). Keeping this in mind as you design your data science strategy will help you avoid technology or intellectual lock in. As an example of the former, when you choose a product from a particular vendor, they will want you to use their offerings for the other components of your data stack. Designing each layer of the stack in a way that can work with other technologies ensures interoperability, an important feature of a robust data technology stack (discussed in detail in Chapter 6 of the strategy book). As an example of the latter, when hiring data scientists, hire not just for what they know now but also for evidence of their curiosity and proclivity to learn new things – a point we elaborate on in Chapter 5 of the data strategy book.
You might be wondering why von Foerster called this the “ethical imperative.” An important aspect of this principle is that your actions should not constrain the choices of others. Since the predictions of your analytical models could well affect the choice of others (e.g. whether or not they are approved for a loan or screened out for a job), you should also cast an ethical eye on your work. We discuss ethics and privacy at length in Chapter 8 of the data strategy book.
–x–
These principles do not tell you what to do in specific situations. Rather, they are about cultivating a disposition that looks at technology through the multiple, and often conflicting, perspectives of those whose work is affected by it. A technical capability is never purely technical; within the context of an organisation it becomes a sociotechnical capability.
In closing: Emergent Design encourages practitioners to recognise that building and embedding a data capability (or any other technical capability such as program evaluation) is a process of organisational change that is best achieved in an evolutionary manner. This entails starting from where people are – in terms of capability, culture, technology, and governance – and enabling them to undertake a collective journey in an agreed direction.
…and, yes, all the while keeping in mind that it is the journey that matters, not the destination.




