Posts Tagged ‘Data Warehousing’
A data warehousing tragedy in five limericks
It started with a presentation,
a proforma regurgitation:
a tired old story,
of a repository
for all data in an organization.
The business was duly seduced
by promises of costs reduced.
But the data warehouse,
so glibly espoused,
was not so simply produced.
For the team was soon in distress,
‘cos the data landscape was a mess:
data duplication,
dodgy information
in databases and files countless.
And politics had them bogged down;
in circles they went round and round.
Logic paralysed,
totally traumatised,
in a sea of data they drowned.
In the light of the following morn,
the truth upon them did dawn.
An enterprise data store
is IT lore
as elusive as the unicorn.
Out damn’d SPOT: an essay on data, information and truth in organisations
Introduction
Jack: My report tells me that we are on track to make budget this year.
Jill: That’s strange, my report tells me otherwise
Jack: That can’t be. Have you used the right filters?
Jill: Yes – the one’s you sent me yesterday.
Jack: There must be something else…my figures must be right, they come from the ERP system.
Jill: Oh, that must be it then…mine are from the reporting system.
Conversations such as the one above occur quite often in organisation-land. It is one of the reasons why organisations chase the holy grail of a single point of truth (SPOT): an organisation-wide repository that holds the officially endorsed true version of data, regardless of where it originates from. Such a repository is often known as an Enterprise Data Warehouse (EDW).
Like all holy grails, however, the EDW, is a mythical object that exists in only in the pages of textbooks (and vendor brochures…). It is at best an ideal to strive towards. But, like chasing the end of a rainbow it is an exercise that may prove exhausting and ultimately, futile.
Regardless of whether or not organisations can get to that mythical end of the rainbow – and there are those who claim to have got there – there is a deeper issue with the standard view of data and information that hold sway in organisation-land. In this post I examine these standard conceptions of data and information and truth, drawing largely on this paper by Bernd Carsten Stahl and a number of secondary sources.
Some truths about data and information
As Stahl observes in his introduction:
Many assume that information is central to managerial decision making and that more and higher quality information will lead to better outcomes. This assumption persists even though Russell Ackoff argued over 40 years ago that it is misleading…
The reason for the remarkable persistence of this incorrect assumption is that there is a lack of clarity as to what data and information actually are.
To begin with let’s take a look at what these terms mean in the sense in which they are commonly used in organisations. Data typically refers to raw, unprocessed facts or the results of measurements. Information is data that is imbued with meaning and relevance because it is referred to in a context of interest. For example, a piece of numerical data by itself has no meaning – it is just a number. However, its meaning becomes clear once we are provided a context – for example, that the number is the price of a particular product.
The above seems straightforward enough and embodies the standard view of data and information in organisations. However, a closer look reveals some serious problems. For example, what we call raw data is not unprocessed – the data collector always makes a choice as to what data will be collected and what will not. So in this sense, data already has meaning imposed on it. Further, there is no guarantee that what has been excluded is irrelevant. As another example, decision makers will often use data (relevant or not) just because it is available. This is a particularly common practice when defining business KPIs – people often use data that can be obtained easily rather than attempting to measure metrics that are relevant.
Four perspectives on truth
One of the tacit assumptions that managers make about the information available to them is that it is true. But what exactly does this mean? Let’s answer this question by taking a whirlwind tour of some theories of truth.
The most commonly accepted notion of truth is that of correspondence, that a statement is true if it describes something as it actually is. This is pretty much how truth is perceived in business intelligence: data/information is true or valid if it describes something – a customer, an order or whatever – as it actually is.
More generally, the term correspondence theory of truth refers to a family of theories that trace their origins back to antiquity. According to Wikipedia:
Correspondence theories claim that true beliefs and true statements correspond to the actual state of affairs. This type of theory attempts to posit a relationship between thoughts or statements on one hand, and things or facts on the other. It is a traditional model which goes back at least to some of the classical Greek philosophers such as Socrates, Plato, and Aristotle. This class of theories holds that the truth or the falsity of a representation is determined solely by how it relates to a reality; that is, by whether it accurately describes that reality.
One of the problems with correspondence theories is that they require the existence of an objective reality that can be perceived in the same way by everyone. This assumption is clearly problematic, especially for issues that have a social dimension. Such issues are perceived differently by different stakeholders, and each of these will legitimately seek data that supports their point of view. The problem is that there is often no way to determine which data is “objectively right.” More to the point, in such situations the very notion of “objective rightness” can be legitimately questioned.
Another issue with correspondence theories is that a piece of data can at best be an abstraction of a real-world object or event. This is a serious issue with correspondence theories in the context of data in organisations. For example, when a sales rep records a customer call, he or she notes down only what is required by the customer management system. Other data that may well be more important is not captured or is relegated to a “Notes” or “Comments” field that is rarely if ever searched or accessed.
Another perspective is offered by the so called consensus theories of truth which assert that true statements are those that are agreed to by the relevant group of people. This is often the way truth is established in organisations. For example, managers may choose to calculate Key Performance Indicators (KPIs )using certain pieces of data that are deemed to be true. The problem with this is that consensus can be achieved by means that are not necessarily democratic. For example, a KPI definition chosen by a manager may be hotly contested by an employee. Nevertheless, the employee has to accept it because organisations are typically not democratic. A more significant issue is that the notion of “relevant group” is problematic because there is no clear criterion by which to define relevance.
Pragmatic theories of truth assert that truth is a function of utility – i.e. a statement is true if it is useful to believe it is so. In other words, the truth of a statement is to be judged by the payoff obtained by believing it to be true. One of the problems with these theories is that it may be useful for some people to believe in a particular statement while is useful for others to disbelieve it. A good example of such a statement is: there is an objective reality. Scientists may find it useful to believe this whereas social constructionists may not. Closer home, it may be useful for a manager to believe that a particular customer is a good prospect (based on market intelligence, say), but a sales rep who knows the customer is unlikely to switch brands may think it useful to believe otherwise.
Finally, coherence theories of truth tell us that statements that are true must be consistent with a wider set of beliefs. In organisational terms, a piece of information or data that is true only if it does not contradict things that others in the organisation believe to be true. Coherence theories emphasise that the truth of statements cannot be established in isolation but must be evaluated as part of a larger system of statements (or beliefs). For example, managers may believe certain KPIs to be true because they fit in with other things they know about their business.
…And so to conclude
The truth is a slippery beast: what is true and what is not depends on what exactly one means by the truth and, as we have seen, there are several different conceptions of truth.
One may well ask if this matters from a practical point of view. To put it plainly: should executives, middle managers and frontline employees (not to mention business intelligence analysts and data warehouse designers) worry about philosophical theories of truth? My contention is that they should, if only to understand that the criteria they use for determining the validity of their data and information are little more than conventions that are easily overturned by taking other, equally legitimate, points of view.
On the limitations of business intelligence systems
Introduction
One of the main uses of business intelligence (BI) systems is to support decision making in organisations. Indeed, the old term Decision Support Systems is more descriptive of such applications than the term BI systems (although the latter does have more pizzazz). However, as Tim Van Gelder pointed out in an insightful post, most BI tools available in the market do not offer a means to clarify the rationale behind decisions. As he stated, “[what] business intelligence suites (and knowledge management systems) seem to lack is any way to make the thinking behind core decision processes more explicit.”
Van Gelder is absolutely right: BI tools do not support the process of decision-making directly, all they do is present data or information on which a decision can be based. But there is more: BI systems are based on the view that data should be the primary consideration when making decisions. In this post I explore some of the (largely tacit) assumptions that flow from such a data-centric view. My discussion builds on some points made by Terry Winograd and Fernando Flores in their wonderful book, Understanding Computers and Cognition.
As we will see, the assumptions regarding the centrality of data are questionable, particularly when dealing with complex decisions. Moreover, since these assumptions are implicit in all BI systems, they highlight the limitations of using BI systems for making business decisions.
An example
To keep the discussion grounded, I’ll use a scenario to illustrate how assumptions of data-centrism can sneak into decision making. Consider a sales manager who creates sales action plans for representatives based on reports extracted from his organisation’s BI system. In doing this, he makes a number of tacit assumptions. They are:
- The sales action plans should be based on the data provided by the BI system.
- The data available in the system is relevant to the sales action plan.
- The information provided by the system is objectively correct.
- The side-effects of basing decisions (primarily) on data are negligible.
The assumptions and why they are incorrect
Below I state some of the key assumptions of the data-centric paradigm of BI and discuss their limitations using the example of the previous section.
Decisions should be based on data alone: BI systems promote the view that decisions can be made based on data alone. The danger in such a view is that it overlooks social, emotional, intuitive and qualitative factors that can and should influence decisions. For example, a sales representative may have qualitative information regarding sales prospects that cannot be inferred from the data. Such information should be factored into the sales action plan providing the representative can justify it or is willing to stand by it.
The available data is relevant to the decision being made: Another tacit assumption made by users of BI systems is that the information provided is relevant to the decisions they have to make. However, most BI systems are designed to answer specific, predetermined questions. In general these cannot cover all possible questions that managers may ask in the future.
More important is the fact that the data itself may be based on assumptions that are not known to users. For example, our sales manager may be tempted to incorporate market forecasts simply because they are available in the BI system. However, if he chooses to use the forecasts, he will likely not take the trouble to check the assumptions behind the models that generated the forecasts.
The available data is objectively correct: Users of BI systems tend to look upon them as a source of objective truth. One of the reasons for this is that quantitative data tends to be viewed as being more reliable than qualitative data. However, consider the following:
- In many cases it is impossible to establish the veracity of quantitative data, let alone its accuracy. In extreme cases, data can be deliberately distorted or fabricated (over the last few years there have been some high profile cases of this that need no elaboration…).
- The imposition of arbitrary quantitative scales on qualitative data can lead to meaningless numerical measures. See my post on the limitations of scoring methods in risk analysis for a deeper discussion of this point.
- The information that a BI system holds is based the subjective choices (and biases) of its designers.
In short, the data in a BI system does not represent an objective truth. It is based on subjective choices of users and designers, and thus may not be an accurate reflection of the reality it allegedly represents. (Note added on 16 Feb 2013: See my essay on data, information and truth in organisations for more on this point).
Side-effects of data-based decisions are negligible: When basing decisions on data, side-effects are often ignored. Although this point is closely related to the first one, it is worth making separately. For example, judging a sales representative’s performance on sales figures alone may motivate the representative to push sales at the cost of building sustainable relationships with customers. Another example of such behaviour is observed in call centers where employees are measured by number of calls rather than call quality (which is much harder to measure). The former metric incentivizes employees to complete calls rather than resolve issues that are raised in them. See my post entitled, measuring the unmeasurable, for a more detailed discussion of this point.
Although I have used a scenario to highlight problems of the above assumptions, they are independent of the specifics of any particular decision or system. In short, they are inherent in BI systems that are based on data – which includes most systems in operation.
Programmable and non-programmable decisions
Of course, BI systems are perfectly adequate – even indispensable – for certain situations. Examples of these include, financial reporting (when done right!) and other operational reporting (inventory, logistics etc). These generally tend to be routine situations with clear cut decision criteria and well-defined processes. Simply put, they are the kinds of decisions that can be programmed.
On the other hand, many decisions cannot be programmed: they have to be made based on incomplete and/or ambiguous information that can be interpreted in a variety of ways. Examples include issues such as what an organization should do in response to increased competition or formulating a sales action plan in a rapidly changing business environment. These issues are wicked: among other things, there is a diversity of viewpoints on how they should be resolved. A business manager and a sales representative are likely to have different views on how sales action plans should be adjusted in response to a changing business environment. The shortcomings of BI systems become particularly obvious when dealing with such problems.
Some may argue that it is naïve to expect BI systems to be able to handle such problems. I agree entirely. However, it is easy to overlook over the limitations of these systems, particularly when called upon to make snap decisions on complex matters. Moreover, any critical reflection regarding what BI ought to be is drowned in a deluge of vendor propaganda and advertisements masquerading as independent advice in the pages of BI trade journals.
Conclusion
In this article I have argued that BI systems have some inherent limitations as decision support tools because they focus attention on data to the exclusion of other, equally important factors. Although the data-centric paradigm promoted by these systems is adequate for routine matters, it falls short when applied to complex decision problems.

