Archive for the ‘Probability’ Category
On the interpretation of probabilities in project management
Introduction
Managers have to make decisions based on an imperfect and incomplete knowledge of future events. One approach to improving managerial decision-making is to quantify uncertainties using probability. But what does it mean to assign a numerical probability to an event? For example, what do we mean when we say that the probability of finishing a particular task in 5 days is 0.75? How is this number to be interpreted? As it turns out there are several ways of interpreting probabilities. In this post I’ll look at three of these via an example drawn from project estimation.
Although the question raised above may seem somewhat philosophical, it is actually of great practical importance because of the increasing use of probabilistic techniques (such as Monte Carlo methods) in decision making. Those who advocate the use of these methods generally assume that probabilities are magically “given” and that their interpretation is unambiguous. Of course, neither is true – and hence the importance of clarifying what a numerical probability really means.
The example
Assume there’s a task that needs doing – this may be a project task or some other job that a manager is overseeing. Let’s further assume that we know the task can take anywhere between 2 to 8 days to finish, and that we (magically!) have numerical probabilities associated with completion on each of the days (as shown in the table below). I’ll say a teeny bit more about how these probabilities might be estimated shortly.
| Task finishes on | Probability |
| Day 2 | 0.05 |
| Day 3 | 0.15 |
| Day 4 | 0.3 |
| Day5 | 0.25 |
| Day 6 | .15 |
| Day 7 | .075 |
| Day 8 | .025 |
This table is a simple example of what’s technically called a probability distribution. Distributions express probabilities as a function of some variable. In our case the variable is time.
How are these probabilities obtained? There is no set method to do this but commonly used techniques are:
- By using historical data for similar tasks.
- By asking experts in the field.
Estimating probabilities is a hard problem. However, my aim in this article is to discuss what probabilities mean, not how they are obtained. So I’ll take the probabilities mentioned above as given and move on.
The rules of probability
Before we discuss the possible interpretations of probability, it is necessary to mention some of the mathematical properties we expect probabilities to possess. Rather than present these in a formal way, I’ll discuss them in the context of our example.
Here they are:
- All probabilities listed are numbers that lie between 0 (impossible) and 1 (absolute certainty).
- It is absolutely certain that the task will finish on one of the listed days. That is, the sum of all probabilities equals 1.
- It is impossible for the task not to finish on one of the listed days. In other words, the probability of the task finishing on a day not listed in the table is 0.
- The probability of finishing on any one of many days is given by the sum of the probabilities for all those days. For example, the probability of finishing on day 2 or day 3 is 0.20 (i.e, 0.05+0.15). This holds because the two events are mutually exclusive – that is, the occurence of one event precludes the occurence of the other. Specifically, if we finish on day 2 we cannot finish on day 3 (or any other day) and vice-versa.
These statements illustrate the mathematical assumptions (or axioms) of probability. I won’t write them out in their full mathematical splendour, those interested in this should head off to the Wikipedia article on the axioms of probability.
Another useful concept is that of cumulative probability which, in our example, is the probability that the task will be completed by a particular day . For example, the probability that the task will be completed by day 5 is 0.75 (the sum of probabilities for days 2 through 5). In general, the cumulative probability of finishing on any particular day is the sum of probabilities of completion for all days up to and including that day.
Interpretations of probability
With that background out of the way, we can get to the main point of this article which is:
What do these probabilities mean?
We’ll explore this question using the cumulative probability example mentioned above, and by drawing on a paper by Glen Shafer entitled, What is Probability?
OK, so what is meant by the statement, “There is a 75% chance that the task will finish in 5 days.” ?
It could mean that:
- If this task is done many times over, it will be completed within 5 days in 75% of the cases. Following Shafer, we’ll call this the frequency interpretation.
- It is believed that there is a 75% chance of finishing this task in 5 days. Note that belief can be tested by seeing if the person who holds the belief is willing to place a bet on task completion with odds that are equivalent to the believed probability. Shafer calls this the belief interpretation.
- Based on a comparison to similar tasks this particular task has a 75% chance of finishing in 5 days. Shafer refers to this as the support interpretation.
(Aside: The belief and support interpretations involve subjective and objective states of knowledge about the events of interest respectively. These are often referred to as subjective and objective Bayesian interpretations because knowledge about these events can be refined using Bayes Theorem, providing one has relevant data regarding the occurrence of events.)
The interesting thing is that all the above interpretations can be shown to satisfy the axioms of probability discussed earlier (see Shafer’s paper for details). However, it is clear from the above that each of these interpretations have very different meanings. We’ll take a closer look at this next.
More about the interpretations and their limitations
The frequency interpretation appears to be the most rational one because it interprets probabilities in terms of results of experiments – I.e. it interprets probabilities as experimental facts, not beliefs. In Shafer’s words:
According to the frequency interpretation, the probability of an event is the long-run frequency with which the event occurs in a certain experimental setup or in a certain population. This frequency is a fact about the experimental setup or the population, a fact independent of any person’s beliefs.
However, there is a big problem here: it assumes that such an experiment can actually be carried out. This definitely isn’t possible in our example: tasks cannot be repeated in exactly the same way – there will always be differences, however small.
There are other problems with the frequency interpretation. Some of these include:
- There are questions about whether a sequence of trials will converge to a well-defined probability.
- What if the event cannot be repeated?
- How does one decide on what makes up the population of all events. This is sometimes called the reference class problem.
See Shafer’s article for more on these.
The belief interpretation treats probabilities as betting odds. In this interpretation a 75% probability of finishing in 5 days means that we’re willing to put up 75 cents to win a dollar if the task finishes in 5 days (or equivalently 25 cents to win a dollar if it doesn’t). Note that this says nothing about how the bettor arrives at his or her odds. These are subjective (personal) beliefs. However, they are experimentally determinable – one can determine peoples’ subjective odds by finding out how theyactually place bets.
There is a good deal of debate about whether the belief interpretation is normative or descriptive: that is, do the rules of probability tell us what people’s beliefs should be or do they tell us what they actually are. Most people trained in statistics would claim the former – that the rules impose conditions that beliefs should satisfy. In contrast, in management and behavioural science, probabilities based on subjective beliefs are often assumed to describe how the world actually is. However, the wealth of literature on cognitive biases suggests that the people’s actual beliefs, as reflected in their decisions, do not conform to the rules of probability. The latter observation seems to favour normative option, but arguments can be made in support (or refutation) of either position.
The problem mentioned the previous paragraph is a perfect segue into the support interpretation, according to which the probability of an event occurring is the degree to which we should believe that it will occur (based on available evidence). This seems fine until we realize that evidence can come in many “shapes and sizes.” For example, compare the statements “the last time we did something similar we finished in 5 days, based on which we reckon there’s a 70-80% chance we’ll finish in 5 days” and “based on historical data for gathered for 50 projects, we believe that we have a 75% chance of finishing in 5 days. “ The two pieces of evidence offer very different levels of support. Therefore, although the support interpretation appears to be more objective than the belief interpretation, it isn’t actually so because it is difficult to determine which evidence one should use. So, unlike the case of subjective beliefs (where one only has to ask people about their personal odds), it is not straightforward to determine these probabilities empirically.
So we’re left with a situation in which we have three interpretations, each of which address specific aspects of probability but also have major shortcomings.
Is there any way to break the impasse?
A resolution?
Shafer suggests that the three interpretations of probability are best viewed as highlighting different aspects of a single situation: that of an idealized case where we have a sequence of experiments with known probabilities. Let’s see how this statement (which is essentially the frequency interpretation) can be related to the other two interpretations.
Consider my belief that that the task has a 75% chance of finishing in 5 days. This is analogous to saying that if the task is done several times over, I believe it would finish in 5 days in 75% of the cases. My belief can be objectively confirmed by testing my willingness to put up 75 cents to win a dollar if the task finishes in five days. Now, when I place this bet I have my (personal) reasons for doing so. However, these reasons ought to relate to knowledge of the fair odds involved in the said bet. Such fair odds can only be derived from knowledge of what would happen in a (possibly hypothetical) sequence of experiments.
The key assumption in the above argument is that my personal odds aren’t arbitrary – I should be able to justify them to another (rational) person.
Let’s look at the support interpretation. In this case I have hard evidence for stating that there’s a 75% chance of finishing in 5 days. I can take this hard evidence as my personal degree of belief (remember, as stated in the previous paragraph, any personal degree of belief should have some such rationale behind it.) However, since it is based on hard evidence, it should be rationally justifiable and hence can be associated with a sequence of experiments.
So what?
The main point from the above is the following: probabilities may be interpreted in different ways, but they have an underlying unity. That is, when we state that there is a 75% probability of finishing a task in 5 days, we are implying all the following statements (with no preference for any particular one):
- If we were to do the task several times over, it will finish within five days in three-fourths of the cases. Of course, this will hold only if the task is done a sufficiently large number of times (which may not be practical in most cases)
- We are willing to place a bet given 3:1 odds of completion within five days.
- We have some hard evidence to back up statement (1) and our betting belief (2).
In reality, however, we tend to latch on to one particular interpretation depending on the situation. One is unlikely to think in terms of hard evidence when one is buying a lottery ticket but hard evidence is a must when estimating a project. When tossing a coin one might instinctively use the frequency interpretation but when estimating a task that hasn’t been done before one might use personal belief. Nevertheless, it is worth remembering that regardless of the interpretation we choose, all three are implied. So the next time someone gives you a probabilistic estimate, ask them if they have the evidence to back it up for sure, but don’t forget to ask if they’d be willing to accept a bet based on their own stated odds. 🙂
The reference class problem and its implications for project management
Introduction
Managers make decisions based on incomplete information, so it is no surprise that the tools of probability and statistics have made their way into management practice. This trend has accelerated somewhat over the last few years, particularly with the availability of software tools that simplify much of the grunt-work of using probabilistic techniques such as Monte Carlo methods or Bayesian networks. Purveyors of tools and methodologies and assume probabilities (or more correctly, probability distributions) to be known, or exhort users to determine probabilities using relevant historical data. The word relevant is important: it emphasises that the data used to calculate probabilities (or distributions) should be from situations that are similar to the one at hand. This innocuous statement papers over a fundamental problem in the foundations of probability: the reference class problem. This post is a brief introduction to the reference class problem and its implications for project management.
I’ll begin with some background and then, after defining the problem, I’ll present a couple of illustrations of the problem drawn from project management.
Background and the Problem
The most commonly held interpretation of probability is that it is a measure of the frequency with which an event of interest occurs. In this frequentist view, as it is called, probability is defined as the ratio of the number of times the event of interest occurs to the total number of events. An example might help clarify what this means: the probability that a specific project will finish on time is given by the ratio of the number of similar projects that have finished on time to the total number of similar projects undertaken (including both on-time and not-on-time projects).
At first sight the frequentist approach seems a reasonable one. However, in this straightforward definition of probability lurks a problem: how do we determine which events are similar to the one at hand? In terms of the example: what are the criteria by which we can determine the projects that resemble the one we’re interested in? Do we look at projects with similar scope, or do we use size (in terms of budget, resources or other measure), or technology or….? There could be a range of criteria that one could use, but one never knows with certainty which one(s) is (are) the right one(s). Why is it an issue? It’s an issue because probability changes depending on the classification criteria used. This is the reference class problem.
In a paper entitled The Reference Class Problem is Your Problem Too, the philosopher Alan Hajek sums it up as follows:
The reference class problem arises when we want to assign a probability to a proposition (or sentence, or event) X, which may be classified in various ways, yet its probability can change depending on how it is classified.
Incidentally, in another paper entitled Conditional Probability is the Very Guide of Life, Hajek discusses how the reference class problem afflicts all major interpretations of probability, not just the frequentist approach. We’ll stick with the latter interpretation since it is the one used in project management practice and research… and virtually all the social and natural sciences to boot.
The reference class problem in project management
Let’s look at a couple of project management-related illustrations of the reference class problem.
First up, consider the technique of reference class forecasting which I’ve discussed in this post. Note that reference class forecasting technique is distinct from the reference class problem although, as we shall see in less than a minute, the technique is fatally afflicted by the problem.
What’s reference class forecasting? To quote from the post referenced earlier, the technique involves:
…creating a probability distribution of estimates based on data for completed projects that are similar to the one of interest, and then comparing the said project with the distribution in order to get a most likely outcome. Basically, [it] consists of the following steps:
- Collecting data for a number of similar past projects – these projects form the reference class. The reference class must encompass a sufficient number of projects to produce a meaningful statistical distribution, but individual projects must be similar to the project of interest.
- Establishing a probability distribution based on (reliable!) data for the reference class. The challenge here is to get good data for a sufficient number of reference class projects.
- Predicting most likely outcomes for the project of interest based on comparisons with the reference class distribution.
Now, the key assumption in reference class forecasting is that it is possible to identify a number of completed projects that are similar to the one at hand. But what does “similar” mean? Clearly the composition of the reference class depends on the similarity criteria used, and consequently so does the resulting distribution. Reference class forecasting is a victim of the reference class problem!
The reference class problem will affect any technique that uses arbitrary criteria to determine the set of all possible events. As another example, the probability distributions used in Monte Carlo simulations (of project cost, duration or whatever) are determined using historical data. Again, typically one selects projects (or tasks – if one is doing a task level simulation) that are similar to the one at hand. Defining “similar” is left to common sense or expert judgement or some other subjective approach. Yet, by the most commonly used definition, a project is a “temporary endeavor, having a defined beginning and end, undertaken to meet unique goals and objectives”. By definition, therefore, we never do the same project twice – at best we do the same project differently (and the same applies to tasks). So, despite ones best intentions and efforts, historical data can never be totally congruent to the situation at hand. There will always be differences, and one cannot tell with certainty that those differences do not matter.
Truth be told, most organizations do not retain data on completed projects – except superficial stuff that isn’t much use. The reference class problem seems to justify the position of this slack majority. After all, why bother keeping data when one isn’t able to use it to predict project performance. This argument is wrong-headed: although one cannot use it to calculate probabilities, historical data is useful because it keeps us from repeating our errors. Just don’t expect the data to yield reliable quantitative information on probabilities.
Before I close this piece, I should clarify that there are areas in which the reference class problem is not an issue. In physics, for example, the discipline of statistical mechanics is founded on the principle that the average motion of large collections of molecules can be treated statistically. Clearly, there is no problem here: molecules are indeed indistinguishable from each other, so it is clear that a particular molecule (of a gas in a container of carbon dioxide, say) belongs to the reference class of all carbon dioxide molecules in that container. In general this is true of any situation where one is dealing with a large collection of identical (or very similar) entities.
Conclusion
The reference class problem affects most probabilistic methods in project management and other areas of the social sciences. It is a problem because it is often impossible to know beforehand which attributes of the objects or events of interest are the most significant ones. Consequently it is impossible to determine with certainty whether or not a particular object or event belongs to a defined reference class.
I’ll end with an anecdote to illustrate my point:
Some time ago I was asked to provide estimates for design work that was superficially similar to something I’d done before. “You’ve done this before,” a manager said, “so you should be able to estimate this quite accurately.”
As many best practices and methodologies recommend, I used a mix of historical data and “expert” judgement (and added in a dash of padding) to arrive at (what I thought was) a robust estimate. To all you agilists out there, an incremental approach was not an option in this case.
I got it wrong – badly wrong. It turned out that the unique features of the project, which weren’t apparent at first, made a mockery of my estimates. I didn’t know it then, but I’d fallen victim to the reference class problem.
Finally, it should be clear that although my examples are project management focused, the arguments are quite general. They apply to all areas of management theory and practice, and indeed to most areas of inquiry that use probabilistic techniques. To use the words of Alan Hajek: the reference class problem is your problem too.
The Flaw of Averages – a book review
Introduction
I’ll begin with an example. Assume you’re having a dishwasher installed in your kitchen. This (simple?) task requires the services of a plumber and an electrician, and both of them need to be present to complete the job. You’ve asked them to come in at 7:30 am. Going from previous experience, these guys are punctual 50% of the time. What’s the probability that work will begin at 7:30 am?
At first sight, it seems there’s a 50% chance of starting on time. However, this is incorrect – the chance of starting on time is actually 25%, the product of the individual probabilities for each of the tradesmen. This simple example illustrates the central theme of a book by Sam Savage entitled, The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. This post is a detailed review of the book.
The key message that Savage conveys is that uncertain quantities cannot be represented by single numbers, rather they are a range of numbers each with a different probability of occurrence. Hence such quantities cannot be manipulated using standard arithmetic operations. The example mentioned in the previous paragraphs illustrate this point. This is well known to those who work with uncertain numbers (actuaries, for instance), but is not so well understood by business managers and decision makers. Hence the executive who asks his long-suffering subordinate to give him a projected sales figure for next month, with the quoted number then being taken as the 100% certain figure. Sadly such stories are more the norm than the exception, so it is clear that there is a need for a better understanding of how uncertain quantities should be interpreted. The main aim of the book is to help those with little or no statistical training achieve that understanding.
Developing an intuition for uncertainty
Early in the book, Savage presents five tools that can be used to develop a feel for uncertainty. He refers to these tools as mindles – or mind handles. His five mindles for uncertainty are:
- Risk is in the eye of the beholder, uncertainty isn’t. Basically this implies that uncertainty does not equate to risk. An uncertain event is a risk only if there is a potential loss or gain involved. See my review of Douglas Hubbard’s book on the failure of risk management for more on risk vs. uncertainty.
- An uncertain quantity is a shape (or a distribution of numbers) rather than a single number. The broadness of the shape is a measure of the degree of uncertainty. See my post on the inherent uncertainty of project task estimates for an intuitive discussion of how a task estimate is a shape rather than a number.
- A combination of several uncertain numbers is also a shape, but the combined shape is very different from those of the individual uncertainties. Specifically, if the uncertain quantities are independent, the combined shape can be narrower (i.e. less uncertain) than that of the individual shapes. This provides the justification for portfolio diversification, which tells us not to put all our money on one horse, or eggs in one basket etc. See my introductory post on Monte Carlo simulations to see an example of how multiple uncertain quantities can combine in different ways.
- If the individual uncertain quantities (discussed in the previous point) aren’t independent, the overall uncertainty can increase or decrease depending on whether the quantities are positively or negatively related. The nature of the relationship (positive or negative) can be determined from a scatter plot of the quantities. See my post on simulation of correlated project tasks for examples of scatter plots. The post also discusses how positive relationships (or correlations) can increase uncertainty.
- Plans based on average numbers are incorrect on average. Using average numbers in plans usually entails manipulating them algebraically and/or plugging them into functions. Savage explains how the form of the function can lead to an overestimation or underestimation of the planned value. Although this sounds a somewhat abstruse, the basic idea is simple: manipulating an average number using mathematical operations will amplify the error caused by the flaw of averages.
Savage explains the above concepts using simple arithmetic supplemented with examples drawn from a range of real-life business problems.
The two forms of the flaw of averages
The book makes a distinction between two forms of the flaw of averages. In its first avatar, the flaw states that the combined average of two uncertain quantities equals the sum of their individual averages, but the shape of the combined uncertainty can be very different from the sum of the individual shapes (Recall that an uncertain number is a shape, but its average is a number). Savage calls this the weak form of the flaw of averages. The weak form applies when one deals with uncertain quantities directly. An example of this is when one adds up probabilistic estimates for two independent project tasks with no lead or lag between them. In this case the average completion time is the sum of the average completion times for the individual tasks, but the shape of the distribution of the combined tasks does not resemble the shape of the individual distributions. The fact that the shape is different is a consequence of the fact that probabilities cannot be “added up” like simple numbers. See the first example in my post on Monte Carlo simulation of project tasks for an illustration of this point.
In contrast, when one deals with functions of uncertain quantities, the combined average of the functions does not equal the sum of the individual averages. This happens because functions “weight” random variables in a non-uniform manner, thereby amplifying certain values of the variable. An example of this is where we have two sequential tasks with an earliest possible start time for the second. The earliest possible start time for the second task introduces a nonlinearity in cases where the first task finishes early (essentially because there is a lag between the finish of the first task and the start of the second in this situation). The constraint causes the average of the combined tasks to be greater than the sum of the individual averages. Savage calls this the strong form of the flaw of averages. It applies whenever one deals with nonlinear functions of uncertain variables. See the second example in my post on Monte Carlo simulation of multiple project tasks for an illustration of this point.
Much of the book presents real-life illustrations of the two forms of the flaw in risk assessment, drawn from finance to the film industry and from petroleum to pharmaceutical supply chains. He also covers the average-based abuse of statistics in discussions on topical “hot-button” issues such as climate change and health care.
De-jargonising statistics
A layperson-friendly feature of the book is that it explains statistical terms in plain English. As an example, Savage spends an entire chapter demystifying the term correlation using scatter plots . Another term that he explains is the Central Limit Theorem (CLT), which states that the sum of independent random variables resembles the Normal (or bell-shaped) distribution. A consequence of CLT is that one can reduce investment risk by diversifying one’s investments – i.e. making several (small) independent investments rather than a single (large) one – this is essentially mindle # 3 discussed earlier.
Decisions, decisions
Towards the middle of the book, Savage makes a foray into decision theory, focusing on the concept of value of information. Since decisions are (or should be) made on the basis of information, one needs to gather pertinent information prior to making a decision. Now, information gathering costs money (and time, which translates to money). This brings up the question as to how much should one spend in collecting information relevant to a particular decision? It turns out that in many cases one can use decision theory to put a dollar value on a particular piece of information. Surprisingly it turns out that organisations often over-spend in gathering irrelevant information. Savage spends a few chapters discussing how one can compute the value of information based on simple techniques of decision theory. As interesting as this section is, however, I think it is a somewhat disconnected from the rest of the book.
Curing the flaw: SIPs, SLURPS and Probability Management
The last part of the book is dedicated to outlining a solution (or as Savage calls it, a cure) to average-based – or flawed – statistical thinking. The central idea is to use pre-generated libraries of simulation trials for variables of interest. Savage calls such a packaged set of simulation trials a Stochastic Information Packet (SIP). Here’s an example of how it might work in practice:
Most business organisations worry about next year’s sales. Different divisions in the organisation might forecast sales using different of techniques. Further, they may use these forecasts as the basis for other calculations (such as profit and expenses for example). The forecasted numbers cannot be compared with each other because each calculation is based on different simulations or worse, different probability distributions. The upshot of this is that forecasted sales results can’t be combined or even compared. The problem can be avoided if everyone in the organisation uses the same SIP for forecasted sales. The results of calculations can be compared, and even combined, because they are based on the same simulation.
Calculations that are based on the same SIP (or set of SIPs) form a set of simulations that can be combined and manipulated using arithmetic operations. Savage calls such sets of simulations, Scenario Library Units with Relationships Preserved (or SLURPS). The name reflects the fact that each of the calculations is based on the same set of sales scenarios (or results of simulation trials). Regarding the terminology: I’m not a fan of laboured acronyms, but concede that they can serve as a good mnemonics.
The proposed approach ensures that the results of the combined calculations will avoid the flaw of averages,and exhibit the correct statistical behaviour. However, it assumes that there is an organisation-wide authority responsible for generating and maintaining appropriate SIPs. This authority – the probability manager – will be responsible for a “database” of SIPs that covers all uncertain quantities of interest to the business, and make these available to everyone in the organisation who needs to use them. To quote from the book, probability management involves:
…a data management system in which the entities being managed are not numbers, but uncertainties, that is, probability distributions. The central database is a Scenario Library containing thousands of potential future values of uncertain business parameters. The library exchanges information with desktop distribution processors that do for probability distributions what word processors did for words and what spreadsheets did for numbers.
Savage sees probability management as a key step towards managing uncertainty and risk in a coherent manner across organisations. He mentions that some organizations that have already started down this route (Shell and Merck, for instance). The book can thus also be seen as a manifesto for the new discipline of probability management.
Conclusion
I have come across the flaw of averages in various walks of organizational life ranging from project scheduling to operational risk analysis. Most often, the folks responsible for analysing uncertainty are aware of the flaw, and have the requisite knowledge of statistics to deal with it. However, such analyses can be hard to explain to those who lack this knowledge. Hence managers who demand a single number. Yes, such attitudes betray a lack of understanding of what uncertain numbers are and how they can be combined, but that’s the way it is in most organizations. The book is directed largely to that audience.
To sum up: the book is an entertaining and informative read on some common misunderstandings of statistics. Along the way the author translates many statistical principles and terms from “jargonese” to plain English. The book deserves to be read widely, especially by those who need it the most: managers and other decision-makers who need to understand the arithmetic of uncertainty.

