misLeading Indicators: How to reliably measure your business


Blog based on the ideas in the book misLeading Indicators

Alberta election results: 19 times out of 20 (if they don’t change their minds)

Posted on | April 25, 2012 | No Comments

The election result this week in Alberta, Canada confounded a lot of people and cast the art of polling in a bad light.   The polls  predicted  that the Wildrose Party would win, but they were trounced by the PC Party.  Two days before the election, one poll gave the Wildrose 38% versus the PCs 36%. For the previous week the polls averaged 41% for the Wildrose and 33% for the PCs, a lead of 8%.  In the actual election results the winning Progressive Conservative Party had 44% to the Wildrose’s 34%, a loss for Wildrose by about 10%

Usually when a reputable poll is reported is goes along with the caveat “this poll is accurate to plus or minus 3% points (or some similar small number) 19 times out of 20.” In 24 polls before the election, not a single one came within 3% of the PC’s 44% final result, and only three came within 3% of the Wildrose’s 34% final result.  The blue band in the diagram shows a rough ±3% band around the PC’s final result, and the green band around the Wildrose’s result.

Opinion polls in 2012 Alberta election. Source: Election Almanac.

What happened to the famous “19 times out of 20”? Is this the poll the 20th time? How could it be, when for 24 polls out of 24, none of them came within 3% of the PC’s election result?

The outcome of this election is more surprising than the 1948 US presidential election, when all the pollsters predicted that Thomas E. Dewey would defeat Harry S. Truman.  But Truman beat Dewey by 5%, much less than by the amount by which the PCs routed the Wildrose.  In 1948, there was still a lot to learn about political opinion polling.  They had had 64 years to hone their skills by the time of the Alberta election.

So what  does the “19 times out of 20” mean?  It means that in some future survey of the same population that uses the same survey method, there is a 95% probability (i.e. 19 out of 20) that the estimate from the survey will be within plus or minus three percent of the value in the whole population. Alternatively, it means that if you do a large number of surveys, and in each one you calculate the estimate, and then the average of these estimates, then there is a 95% probability that this average will be within plus minus 3% of the value in the population.

But there are two snags. Once the survey has been completed and the data collected and analysed, we cannot say whether estimate from this specific sample is within 3% of the value in the population. The reason is that you rarely get a sample that is representative of the population as a whole. A particular sample might be one percent off, or six percent off.  It is usually very hard, if not impossible, to know.

Many people explain, irrelevantly, the plus or minus number as follows: “if we did this same survey a gazillion  times, and nobody changed their mind while we did it, about 5% of the time (one time out of 20) the proportion in the population and the proportion in the sample could differ by more than 3%.”

Of course nobody ever repeats a survey a gazillion times, or even intends to, so this explanation of the “plus or minus” number is mostly nonsense.  And even if you could repeat it, answers to survey questions are not constants—and that’s the second snag.

In an election people change their minds. That is the whole purpose of a campaign.  The 19-ties-out-of-20 rules works very well if you are sampling marbles from a jar and want to know what percentage are blue or green. But voters are not marbles (although politicians often talk about them as if they had lost their marbles).  Voters constantly change.

So the “19 times out of 20” caveat at the end of every poll should really be “this poll is accurate to within plus or minus 3%  19 times out of 20 if the voters don’t change their minds. But they usually do.”

© 2012 Greenbridge Management Inc.

misLeading pant sizes-why women aren’t as thin as they think

Posted on | April 11, 2012 | 2 Comments

The Economist magazine is  taking a jab at another sort of inflation. Women’s pant sizes, while nominally the same, have actually been increasing in girth.  The British new magazine estimates that an average size 14 pair of women’s pants  “is now more than four inches wider at the waist than it was in the 1970s.”

Some fashion industry bloggers explain this away with such nonsense as “sizes evolve, just like people do,”  or “we have to change our sizes based on average [people] sizes.” In other words, people are getting fatter, so the clothing sizes have to get fatter with them.

If a measurement instrument changes when the thing it measures changes, the resulting “measurement”  is not a measurement at all.

Photo: ©iStockphoto.com/gregory horler

 

© 2012 Greenbridge Management Inc.

 

 

Facebook’s new check-in measure: accuracy vs practicality

Posted on | March 31, 2012 | No Comments

Facebook sent out a message the other day on its new way of counting “check-ins” to improve accuracy. The result of the increased “accuracy” will be a drop in the number of check-ins.  The message said:

“We are revising check-in numbers on Facebook Pages to give you a more accurate picture of how people are visiting your business. Among these changes, previously, if an individual checked into your business multiple times, each check-in was counted into your Page’s total check-in number. Now, if someone checks into your business multiple times within a 12-hour period, that action will be counted as one unique check-in.”

A check-in is when a Facebook user uses their mobile phone to tell their Facebook friends what business or place they are at, using Geographic Positioning System information.

To make an accurate count you need a clear definition. Try this simple experiment: ask a group of friends or colleagues to count the number of light bulbs in a room.  They’ll immediately get hung up trying to decide whether to include florescent tubes, compact florescent bulbs, halogen bulbs, light emitting diodes and the dozens of other modern devices that emit light. If you don’t provide a definition and ask them to count anyway, you’ll get almost as many counts as there are people counting.  The same applies to just about anything else, whether customers, inventory or complaints.

You can’t come up with a definition though, unless you have enough background information. If the reason for counting light bulbs is to keep enough spare bulbs and tubes on hand (and other devices that serve to illuminate the room), the definition—and thus the count—will  probably be different than if the reason for the count is to determine whether incandescent bulbs have been removed to save energy.

Two types of bulbs.

Count the light bulbs. One or two? Photos istockphoto/benimage. Zoran Vukamanov Simokov/shutterstock.com

With the new measure, a Facebook check-in is now something that can only be done once every 12 hours. Why not 24 hours or six hours or two hours?

The challenge with a count such as this is that you can only understand what it means if you know the definition.  Such indicators are called instrumentalist indicators: they are not concerned with underlying truth or falsehood—such as the number of bulbs in a room—but  merely practical value.  Accuracy means something is close to the underlying “truth.” But what is the “truth” concerning something as vague as a “check-in?” In that sense Facebook’s new check-in measure may not be any more accurate, but it may be more practical than when people check-in more frequently than once every twelve hours.

© 2012 Greenbridge Management Inc.

How do you measure the performance of a business book?

Posted on | February 29, 2012 | Comments Off

Today* stores start selling our book—misLeading Indicators: How to Reliably Measure your Business. It has been a while coming. It’s natural for us as authors to wonder how it will “perform”.

Many of our acquaintances wonder too, and ask us how we expect it to perform. Their indicators are usually sales or royalties.  Those are obvious and natural indicators. But they are not what drove us to write it.

We have seen a lot of problems with business performance measurement—problems that sometimes caused business failures, huge losses and even serious accidents.  These problems were frustrating to us and we wanted to play some small part in providing a positive influence to correct them.

Photo: Lurii Davydov/Shutterstock.com

At the same time, Green was impressed by how many of his clients—on several continents—had the Balanced Scorecard and similar books on their book shelves.  Obviously they were interested in business measurement, and some had done a lot to implement performance measurement systems. But these books do not tell their readers how to determine whether they can trust their measurements and indicators—hence our book.

A relevant performance indicator for the book (and for us) would have to have something to do with the influence of the book. So perhaps:

  • Number of managers who have misLeading Indicators on their book shelf.

The problem with that indicator is that some people—and it’s hard to tell how many—buy books but do not read them.  So here is another attempt:

  • Number of people who read misLeading Indicators

Well, that one is pretty hard to measure. And it’s pretty sure there will be some (uncountable number of ) people who read the book and do little about it. Perhaps it would be better to measure

  • Number of companies that implement the ideas in misLeading Indicators

But how do you adequately define, let alone measure, “implementation.” Hmmmm…..

One key point of our book is that you have to strike the right balance between the measurable and the un-measurable. It is a mistake to rely too heavily on quantification and overlook vital, but unmeasurable, information.

Perhaps the most important indicator to us of the performance of the book will be the stories people tell us  (and their friends and colleagues) about how the book influenced something important in the way they run their business, and how that helped.

But for that to happen, it has to sell, and people have to read it, and think about it, and do something about it…starting with YOU!

*February 29, 2012

 

© 2012 Greenbridge Management Inc.

 

Brilliant propaganda, but lousy indicator of rate of growth of US debt.

Posted on | February 6, 2012 | Comments Off

 

There is a little graphic being circulated around the Internet intended to show that President Obama has recklessly doubled the total US debt accumulated since President Washington.

 

 

 

 

 

 

 

 

 

As a piece of political propaganda it is brilliant. It implies that President Obama’s administration is worse than all previous administrations when it comes to debt. Is it? As a performance “indicator” it is not so good.

The US debt is growing exponentially, and has been for 40 years under all presidents. Think of a pond with exponential growth of lily pads. Every week the number of lily pads growing in it doubles. How full was the pond one week before it was completely full of lily pads? Half full, and it took many weeks to get there, just like the US debt took many presidents to get to the point where it had half as much debt as today. A better indicator of exponential growth is not how long ago there was half as much, but the length of the doubling period.

To account for increasing population, it’s better to measure how profligately US governments have been going into debt by calculating debt per capita rather than gross debt. In September 2011, US debt per capita was $47,652. It took 8 years, from 2003 to 2011, for debt/capita to double from $23,826. So the graphic is superficially correct. (It would also be better to account for inflation and use constant dollars. The trouble is; what indicator of inflation should be used? See here and here and here and here.)

The chart below shows the terrifying picture of US debt/capita since 1791.

The huge debt figures at the end of the 20th century completely mask what was going on in the 19th. Debt per capita actually declined most years, with the exception of the period during the Civil War (see below).

The 20th century is a different picture all together, as seen in the chart below. The scale shows powers of two, to make it easier to see how long it takes it to double debt/capita. There were two jumps in World War I and II. Between and immediately after the wars debt per capita was flat or declined. Then around 1970, debt per capita took off.

The chart below works backwards, starting with the current debt/capita of $47,652, and then downwards to show the year at which the debt per capita was one half this amount, one quarter and so on. Each time the debt/capita line crosses the dotted horizontal lines, debt has doubled (going left to right) or halved (going right to left).

From 1976 to 2011, debt per capita grew 16 times—that means it has doubled 4 times (from $2,978 to $47,652). That’s a compound average growth of 8.2% in debt per capita.

At 8% growth, debt per capita will keep doubling every eight or nine years or so. That’s how long it took to double it to President Obama’s number from George W. Bush’s number in 2003. All he is doing is holding the shovel steady as the US inexorably digs a deeper hole.

(Note: to calculate debt per capita I linearly interpolated US population between censuses, which are taken every ten years. Source of debt data: http://www.treasurydirect.gov).

© 2012 Greenbridge Management Inc.

A leading indicator of business cycles that have already happened

Posted on | January 30, 2012 | Comments Off

A “leading indicator” is supposed to forecast, or at least to help the person using it, to make a forecast.

The Conference Board recently announced that it was changing its Leading Economic Indicator to address structural changes that have occurred in the US economy in the last few decades.  The new indicator was released on January 26, 2012.

In its press release,  the  Conference Board says “Revised figures show that adding the new Leading Credit Index™, in conjunction with other changes, makes the LEI a more accurate predictor of U.S. business cycles since 1990.”

A better predictor of business cycles since 1990? They have already happened.  That’s an interesting twist on economist Paul Samuelson’s famous 1966 quip that “Wall Street indexes predicted nine out of the last five recessions.” Now economists predict them after they happen. As physicist Niels Bohr said “prediction is very difficult, especially about the future.” It would be more convincing (to put it mildly) if they had revised the LEI before the last recession and made a prediction that turned out to be true. That would show that the indicator had some ability to “lead”.

The Leading Economic Indicator (LEI) is an index that combines ten indicators, including average weekly hours of manufacturing, average weekly unemployment insurance, new orders, building permits, stock prices and others.

In the new index there are changes such as removing the inflation-adjusted money supply and replacing it with a new Leading Credit Index. According to Kathleen Bostjancic, director of macro analysis at the Conference Board, the old index was being “skewed” by the money supply.

This blog post by Doug Short has a series of charts that compare the old and new indices from 1959 to today.  I’ve shown one of them below (the red line is the old LEI and the blue line is the new one). By superimposing the two indicators on the same chart, Short showed that the old and new indicators crossed paths in 1994, 2001 and 2008. The rest of the time they moved roughly in parallel, except in recessions. The difference between the two indicators is most dramatic before, during and after the 2008-2009 recession. The new LEI shows a much more dramatic drop in economic activity, and a much slower recovery.

Leading Econmic Indicators-Doug Short

Only time and experience will show whether the new LEI will be a better tool for forecasting economic activity than the previous one.  In the meantime, it is worth remembering an old lesson from some of the greatest scientists—just because you can obtain numbers from measuring does not mean the thing you think you are measuring actually exists.

And just because it is called a Leading Indicator does not mean it is.

Hoaxing, forging, trimming, faking, cooking and falsifying measurements

Posted on | January 23, 2012 | Comments Off

There have been several recent high-profile cases of scientists hoaxing, forging, trimming, cooking or falsifying their data. These cases occurred in spite of the fact that scientific publications must be peer-reviewed before publication, and that scientific findings must be replicated before they are confirmed.

Would the safeguards in place in most business catch similar acts?

Photo:olly/shutterstock.com

In 2007 Harvard University seized computers, notes, videotapes and other materials from the lab of Marc Hauser, a professor of psychology. The university found him guilty of scientific misconduct.  In 2011, Tilburg University in the Netherlands, after an extensive investigation,  found that Diederik Stapel, a professor of social psychology, had fabricated data for 15 to 20 years.

In Hauser’s case, the fraud came to light when researchers in his lab viewed video tapes Hauser had taken of chimpanzees and found that Hauser’s coding of the results was not consistent with their own observations. Whistleblowers also alerted the authorities to Stapel’s fraud.

The failure to replicate scientific results does not in itself point to fraud.  When scientists make observations in nature, rather than the lab, they can be very hard to replicate because the precise conditions under which they made the observations may be unique.  Sometimes tremendous scientific resources are spent collecting data, making it hard for others to repeat an experiment.  Sometimes the data take so long to collect—such as in long term epidemiological studies—that it would take too long to replicate them.  Scientists often use lab animals, but there may be genetic or environmental differences between populations in one lab to the next that make replication difficult.

There are fortunes to be made on any drug that extends life. There have been many studies on the compound resveratrol, which comes from red wine. It is thought that it may prolong life. Some scientists have found that it extends life in simple creatures such as yeast and worms, and others have found that it does not. Does this mean there is fraud involved? Not according to one of the scientists, Heidi Tissenbaum from University of Massachusetts medical school, who said “nobody’s falsifying; people are just getting different results. “

All of these difficulties in detecting fake measurements exist in business. Procedures such as calibration check that instruments are working, but don’t catch fabrication or cooking of data.  Many business measurements, such as marketing studies or experiments in factories, are like the field observations of scientists in nature: the exact conditions are hard to replicate. People regularly “trim” out measurements that do not look right, even though they may be indicating a problem.

There are no simple tricks for catching such problems with data.  But a good starting point is to develop a healthy suspicion for all numbers.

© 2011 Greenbridge Management Inc.

How many people visit your website?

Posted on | December 23, 2011 | Comments Off

Whether you are writing a blog like this one, or run a major e-commerce website, it’s natural to want to know how many people visit.

Photo: auremar/shutterstock.com

There are lots of tools—generally known as web analytics—to measure web site traffic. The problem is, “People don’t visit websites. Their computers do.”  But you cannot identify how many people are hiding behind a single computer. Every person in the Ford Corporation has the same IP (Internet Protocol) address. And most home users are given a different IP address every time they get on the web.  So it could look like you are getting more “visits” from a single home user than from a dozen Ford employees. And, as in the photo above, you don’t know how many people are looking at the same computer screen.

Measuring website visits depends on the definition of a “visit.”  The Joint Industry Committee for Web Standards says a single visit is a series of page requests with no gaps greater than 30 minutes.  So if you click on the link for a page 31 minutes after you first landed on the  site, it’s counted as a new visit, even if you spend most of your first 30 minutes getting a coffee and reading the page. (Click here for their stuff on measuring website traffic).

This article does a nice job of explaining the challenges of measuring web traffic.

© 2011 Greenbridge Management Inc.

Are Americans getting wealthier or poorer? It depends on how you measure “wealth.”

Posted on | December 23, 2011 | Comments Off

The standard measure of wealth is GDP per capita. The chart below shows that Americans have been getting continually wealthier for decades, with a few blips here and there (source of data). The measure of wealth—Gross Domestic Product, is based on the dollar value of economic transactions.

Such a measure depends crucially on the definition of wealth.  People have a bad habit of confusing financial assets with real ones because they are confused about the nature of wealth.

This striking newsletter from Stansberry’s Investment Advisory claims Americans have been getting poorer since 1965. The problem with using the dollar as a measure of wealth is that “we don’t have a sound currency with which to measure GDP through time.” Instead of using dollars, Stansberry’s used a basket of commodities. They used real assets instead of financial ones.

Their trend line goes in exactly the opposite direction from the one in the chart above.  Stansberry’s says Americans are “faking” wealth by going deeply into debt.

© 2011 Greenbridge Management Inc.

Dead right.

Posted on | December 5, 2011 | Comments Off

I snapped this picture at a cross walk in Manhattan near Central Park over the weekend. What does it say? Stop or walk? You could walk and not, technically, be jaywalking.  You stand a good chance of getting smacked by a car though if you do. You’d be technically right. And maybe even dead right.

Misleading indeed, but at least in this case it is easy to tell it is misleading. It is not usually so evident when your indicator is a measured quantity.

Misleading walk signal in New York

Studies show studies don’t show

Posted on | November 29, 2011 | Comments Off

It’s used in medical research, engineering research, and just about every other form of research. It determines whether research is “significant” or destined for the garbage bin. And it is one of the most misleading indicators ever developed.  It is known as the “p-value.”

The p-value is the number that comes out of most orthodox statistical tests such as t-tests, analysis of variance, regression and many others. It is supposed to tell you the probability that you are mistakenly concluding that you have found an effect when actually there is not one (a “false positive”). So of course researchers seek small p-values

Photo: wavebreakmedia ltd /shutterstock.com

such as 5% or 1%.

Although that’s what people think that probability is, it is not. I’ll get to that. But even if it were, the way researchers misapply it means that they draw the wrong conclusions.

A landmark article by John Ioannidis claimed that most published research findings are false.  The reason is (basically) that there are too many scientists chasing too many significant results in areas where there are small effects, that researchers tinker with experimental designs and mine data in search of significance,  and that there are financial or other incentives to get significant results published.  Ioannidis also said that “many research findings are simply accurate measures of prevailing bias.” Ioannidis showed mathematically how this occurs.

Other studies have shown similar results. A recent analysis of studies in neuroscience found that about half of them used the wrong statistical procedure.  Suppose a researcher finds that 30% of mutant nerve cells respond to a chemical, and gets a significant p-value. Then the researcher finds that only 15% of normal cells respond, and this result does not generate a significant p-value.  With this second result, the first result becomes meaningless. To be able to say that the chemical has a significant effect on mutant cells you would have to show that the 15% difference between mutant and normal cells was significant.

What is the meaning of the p-value? What the p-value gives is the probability of getting the observed data by chance given that there is no effect. But that is not the probability we should be interested in. The probability of getting the data when there is no effect is not very relevant. The probability we need to know is the probability that there is an effect, given the actual data we just got. Those are very different probabilities, but most researchers—knowingly or unknowingly—present them as though they were the same.

© 2011 Greenbridge Management Inc.

misLeading Indicator suggests world exporting to aliens

Posted on | November 29, 2011 | Comments Off

When I do calculations on a performance indicator, I usually do the calculation more than one way. This gives me a good check on my method, and gives me assurance that the indicator is meaningful if both calculations match. In some cases this task is simplified if there is some constraint that must be met, such as all the numbers being forced to add up to zero or 100%.

So it should be with world current account deficits. They should add up to zero.

The Economist Magazine added up the current account deficits published by the International Monetary Fund and got $331 billion, causing it to quip that the aliens must be “buying Louis Vuitton handbags.” (See here).

Source: Shutterstock

The root of the problem is mis-measurement of the current account deficit. The Economist reports that these measurement errors have jumped.

© 2011 Greenbridge Management Inc.

Why your customers may not see things the way you do

Posted on | October 28, 2011 | Comments Off

Are your customers experiencing something completely different from what your indicators tell you they are experiencing? Probably. Suppose a retail company with branches of different sizes measured some aspect of customer service in each of its branches, say wait times.  Will the average wait time be a good representation of the average customer’s experience?

Take a company with three branches. The average wait times from each branch are 8, 5 and 2 minutes, for an overall average of 5 minutes. This is the number the company might publicize. It might even offer a service guarantee of five minutes.

But let’s look at it from the perspective of the customers waiting in line. The first branch serves 1,000 customers every day, the second serves 250, and the third serves 100. There thus are 1,000 people who experience an average 8 minute wait, 250 who experience a 5 minute weight and 250 who experience 2 minutes. If we weight the average wait times in each branch by the number of customers, the overall average jumps from 5 minutes to 7. This more accurately reflects the experience of customers.

That could be an expensive service guarantee! Be careful of averages that do not represent the experiences of individuals.

© 2011 Greenbridge Management Inc.

Moneyball shows that you should challenge thinking about KPIs

Posted on | October 14, 2011 | Comments Off

Business writers often speak of “measures that drive future performance.” The trick is figuring out what measures those are.

Doctors have been measuring their patients since the Italian doctor Santorio Santorio invented a scale for the Galileo thermometer in 1612.  But patients can (and do) die soon after they get a clean bill of health, even when all the “drivers of future performance” indicate that they are healthy. Despite huge efforts, medical science has still not figured out all the drivers of future performance.

The same is true in many fields, including business.  People too easily fall into the trap of assuming that some indicator is a driver of performance, or a “leading indicator,” out of habit, or wishful thinking.

The movie Moneyball shows how easy it is to fall in love with such presumed drivers of performance, and why it is worth challenging them, because they may be misleading indicators. Billy Beane, General Manager of the Oakland Athletics,  “discounted what scouts have done for 150 years.”   He rejected previous baseball key performance indicators such as stolen bases and batting averages in favor of others such as on-base percentage and slugging percentage.

Using such measures Beane was able to find players at far cheaper salaries. His 2002 draft put the new techniques to the test. Did they work? It depends on your measure of success. You can judge the results for yourself in the chart below, which uses wins versus losses as a measure of success. At the very least, the A’s were competitive with teams that had triple the payroll.

This suggests they may also have been successful on another important measure of success: their profitability.

What polls, police radars and the Hawthorne experiments have in common

Posted on | September 29, 2011 | Comments Off

There is an election campaign in my home province of Ontario. As often happens during elections, somebody raises the issue about whether public opinion polls influence the outcomes of elections (such as in this article).

If only measurements involving people could produce objective information. People and the way they react to measurements make this very hard to achieve.

When you see a police car sitting at the side of the highway, with a radar gun pointing at you, you will most likely break or ease off the gas.  The awareness of being measured affects your velocity and position. The observer affects the observed when people measure people. Even the fear of a hidden policeman measuring your speed will cause most drivers to moderate their speed—and increase their stress.

Radar

Photo: VladKol/Shutterstock.com

The Hawthorne experiments showed that measurement in organizational or social settings is itself a source of perturbation. In addition to providing feedback, the act of measuring also changes people’s behavior by stating the intentions and priorities of the measurer. It thus implies the measurer’s values. When a CEO rigorously measures variance from budget, people understand that sticking to budget is important, and act accordingly.

The Hawthorne effect is named after a series of experiments lasting several years between 1924 and 1932 from the Hawthorne plant, near Chicago, owned by the Western Electric Company. The plant manufactured relays for the Bell Telephone Company.

Researchers noticed improvements in worker productivity when the women working in the plant were given a variety of different working conditions, such as changing the lighting, and their output was measured. The folklore that developed from these experiments, and that persists to this day, is that the performance of people who are singled out in a study of any kind improve, not because of the parameters or variables of the study under study, but because they are pleased with the attention they receive.

H. McIlvaine Parsons showed this folklore to be bunk in a study in Science magazine in 1974.  What the Hawthorne experiments actually showed was that the productivity did not change because the lights were made brighter or dimmer. The productivity changed when the subjects of the study received measurements of their own productivity.

In social situations, measurement itself causes change in behavior and performance. It cannot be just a means to objectively measure performance.

Do opinion polls affect the outcomes of elections? You betcha.

© 2011 Greenbridge Management Inc.

The Operational Dashboard that had no Key Performance Indicators

Posted on | September 25, 2011 | Comments Off

The “dashboard” is often used as a metaphor by business writers and consultants to explain to managers that they need performance measurement and key performance indicators for their business, just like they do for their cars. Otherwise they’ll drive their business off the road, they (falsely) argue.

The problem is, the dashboards on cars don’t help you keep your car on the road. Yesterday a couple came on my street driving a 1931 Model A Ford. I snapped this picture of the dashboard.

Model A Ford dashboard

Photo by Philip Green

It has three instruments: an Amp meter to tell the charge on the battery; a gas gauge; and a speedometer. That’s it.  That’s all. They are at indicators to help you manage the risks of running out of gas, battery charge, or going too fast. You could get by without them.

The dashboard on the earlier Model T Fords had no instruments. It was a board. The Amp meter was the first the be added to the board when starter cranks were replaced with starter motors powered by electric batteries.

What you need to keep your car on the road is a good pair of eyes. Measurement cannot tell you everything you need to know. There is a clear and vital role for non-measurable information. If you just relied on the indicators on a dashboard to drive a car you would go off the road.

The same lesson applies to running business.Non-measurable information is vital.

The importance of structural information

Posted on | September 20, 2011 | Comments Off

Business writers and consultants today typically advise managers to calculate indices and percentages to measure the state of their enterprises. For example, Kaplan and Norton, in the “Balanced Scorecard”, suggest measuring “strategic information availability” with the “percentage of processes with real-time quality, cycle time, and cost feedback available” and “percentage of customer-facing employees having on-line access to information about customers.”  Meyer and Ross, authors in Peter Senge’s collaboration “The Dance of Change,” advise readers to develop similar measures, to strive for simplicity, and to display them on an “Operational Dashboard.”

Many management writers advise their readers to limit the number of things they measure to some arbitrarily small number. Meyer and Ross, for example, state “If you could only track six or eight things, what would they be? The point is to avoid devising so many numbers and gauges that your dashboard looks like a 747 cockpit. Otherwise you’ll spend all your time looking at the dashboard and forget to ‘look out the windshield’—to actually implement the work.”

I doubt many would disagree with their advice to “look out of the windshield,” but it is not very helpful. As for the rest, you should measure what you need to measure to prompt timely and specific action, while preserving the structural integrity of the data. Every instrument on the 747 is there to help fly the plane. The instruments prompt the pilots to take specific actions. There is virtue in simplicity, but the number of things you should measure is dictated by the structure of the information and the actions that need to be taken, not by simplicity for simplicity’s sake.

Simplicity is of course a desirable attribute of an information display. Many  “operational dashboards”, however, use a lot of ink and space to display very little. (Go to Google images and search for “business operational dashboard” for many examples.)

Many of the displays in these dashboards can be almost totally useless. They only give broad measures of progress because they throw out structural information. Without structural information, it is hard to know what corrective action to take. Furthermore, displaying the information in the form of dials makes the displays unnecessarily complicated.

Edward Tufte said that graphical displays should “show the data; present many numbers in a small space; avoid distorting what the data have to say; induce the viewer to think about substance; reveal the data at several levels of detail, from a broad overview to the fine structure.”  He says data displays should reveal the complex.”

Unfortunately, many dashboards don’t reveal it but conceal it, because they crunch numbers into single indices.

For example, a performance indicator of inventory turns for a consumer products manufacturer may suggest to management that there is too much inventory. “Twelve inventory turns per year” is how it is often reported. This gives the false impression that all inventory turns at the same rate. It ignores the structural information. There is not an equal amount of inventory for every product. Nor is there an equal proportion of inventory for every unit of sales.  Capturing, displaying and acting on this structural information takes more effort than calculating overall averages and observing the overall direction they indicate.   It also leads to greater improvements.

Both a logistics manager and a sales manager should know, for example, that the two percent of products that accounted for the top twenty percent of sales last year are currently taking up twenty-eight percent of the inventory, and are thus turning relatively quickly. They might also find it useful to know that more than half their current product offerings account for less than 20% of sales, taking up 10% of inventory space.    This information can prompt several specific actions: changing product mix, getting rid of slow-moving products, sourcing raw material differently or changing manufacturing strategy based on the relative speeds at which various products turn.

You may be able to calculate indices in a way that makes them mathematically valid, but by throwing out the structure of the data, they are not “structurally valid.” Peter Drucker, in his classic book “Management” wrote in 1973 “to enable controls to give the right vision and to become the grounds for effective action, the measurement must also be appropriate. Thus, it must present the events measured in structurally true form. Formal validity is not enough.”

Yet the advice has not been heeded, and to a large extent, our creative and data crunching ability with computers has made it worse.

Can you make sense of on-line media metrics?

Posted on | August 10, 2011 | Comments Off

On-line advertizing has grown immensely over the last few years. One estimate pegged growth from $55 million in 1995 to $54 billion in 2009, an annual growth rate of 63%. It’s not surprising then that there is some interest in trying to measure on-line advertizing.

If an advertiser pays for a thousand ad “impressions” with Google, is it getting the same exposure as if they pay for a thousand impressions with Yahoo or MSN? It’s a much trickier problem than measuring print or TV advertizing.   For example, in TV advertizing there are ways to estimate the number of viewers, and ads are usually bought for defined spots, such as 15 seconds or 30 seconds. On the internet, what the server coughs up for a viewer to see in his browser isn’t necessarily what the viewer sees, and the number of ways to display an ad on a web page is limited only by the imagination.

To deal with the problem, several advertising organizations (Interactive Advertising Bureau, the Association of National Advertisers and the 4A’s) joined forces to find a way to “develop digital metrics and cross platform measurement solutions.” Their initiative is called “Make Measurement Make Sense.

The challenge they are going to have is to define what they are measuring and what the metrics are use  for. Their website highlights five principles. There should be a defined standard for a “viewable impression.” Impressions should not be just what the server serves, but what the audience sees.  Ad units should be standardized. Metrics should be relevant for brand marketers. And digital media measurement should be comparable with other media.

I have a little experiment I do in my management seminars. I ask the participants to count the number of light bulbs in the room. Never, in nearly twenty years, have any single group of participants agreed on how many bulbs are in the room.  There are two reasons. They don’t know have enough background information to understand the relevance of the number, and, because of that, they don’t all use the same definitions of light bulbs. Some people count round incandescent bulbs as light bulbs and neon tubes and even TVs. Others count everything that emits light—even computer screens—and some only count incandescent bulbs.

If I say “I need to know the number of incandescent bulbs in the room so I can replace them with compact fluorescent bulbs” I would get more consistent answers than if I ask “how many light bulbs are in the room?”

There are probably a lot more ways to define an on-line ad and an impression of that ad than there are ways to illuminate a room with something called a bulb. The advertisers have their work cut out for them. Whether they can make sense of it all remains to be seen.

Investing in measurement usually pays off

Posted on | July 31, 2011 | Comments Off

A headline in British journal “The Engineer” caught my eye a few weeks ago. It said that the British government has announced a £240 million (US $393 million) investment in measurement, with the aim of improving measurement techniques and technology to stimulate innovation in productions, processes and services.

The heart of the scientific process is measurement, for without a way to measure the property or phenomenon you are studying you cannot learn much about it.  Scientists spend a tremendous amount of time and effort developing and testing new ways to measure and refining old ones. Often when they do they lead to tremendous breakthroughs and whole new industries.

For example, in the late 19th century, British and German scientists and politicians argued over two competing methods for calibrating instruments for measuring electrical resistance. Precise measurements, and thus calibrations, of electrical resistance were necessary to find faults in the telegraph cables, especially undersea cables. A single fault can render a cable that is thousands of miles long completely useless. The manufacture and laying of cables was a new and growing business at the time. The protagonists included Lord Kelvin and the German engineer Werner von Siemens (1816-1892). In 1847 Siemens and Johann Halske founded Telegraphen-Bauanstadt von Siemens & Halske, which grew into the modern multinational Siemens.

Siemens had invented a device that used mercury to calibrate electrical resistance measurements. In his autobiography he wrote that “the scales of resistance with the mercury unit, prepared by my firm, proved extremely useful in laying the cable from Suez to Aden, and for the first time made reliable determinations of fault possible.” He was able to pinpoint faults within meters, and then fish up the cable so that it could be repaired and transmit telegraphs. Siemens won contracts to lay thousands of miles of telegraph cables. Siemens’ method, with some modifications, became the standard in the 1884 International Telegraph Conference.

The success of Siemens in its early days was in no small part due to Werner von Siemen’s instrument for measuring electrical resistance.

Businesses can also reap benefits of investing in measurement. Measurement is at the heart of many decisions, and is essential for controlling or managing processes that can range from manufacturing to website traffic.  It is often worth investing in new and refined ways to measure. I have usually seen the benefits far outweigh the costs of the effort.

Gas pumps inaccurate, study shows. So are all measurements, experience shows.

Posted on | June 30, 2011 | Comments Off

As Canadians and Americans fill up their tanks for the July 1st and 4th long weekend they can contemplate this piece by CBC news that tells us that inaccurate gas pumps are short-changing Canadians.

It is impossible to measure anything in a way that is consistently accurate. There are only ways to measure that are accurate to within acceptable tolerances.

Gas Pump

dean bertoncelj /Shutterstock.com

I tried to find out what the tolerances are for gas pumps in Canada.  Usually tolerances are the same internationally to facilitate trade.  This study (page 105) quotes the US National Institute of Standards and Technology tolerances of 0.5% for the type of pumps consumers use.

This means that multiple measurements must be within the 0.5% of the value that appears on the pump.  Furthermore, the difference between largest and smallest of these measurements must be less than 40% of the tolerance value, or 0.12%. The first condition makes sure all measurements are within a tolerable value of the displayed value, and the second makes sure they are close to each other.

This means you could have a pump that legally shortchanged consumers by, say, an average of 0.4%.  If it costs $75 to fill your tank, that’s $0.30 every time you fill it. If you fill it once a week, that’s about $16 per year.   It’s up to you to decide if you can tolerate this legal shortchanging. You could also have another gas pump that legally gives you 0.4%  free gas. It’s up to you to decide if you can tolerate this legal giveaway.

This balancing act between short changing, giveaways and legal tolerances gives companies a strong incentive to use measurement devices that are very accurate. It would be easy to avoid being penalized for shortchanging by erring on the side of giving away. But that costs money. Better to make your pump more accurate.

The Automobile Protection Association quotes a Measurement Canada study that says between 1999 and 2007 4.9% of all gas pumps exceeded the 0.5% tolerance and provided less fuel than the volume that appears on the pump. What it does not say is by how much.  Nor does it say how many gas pumps exceeded the 0.5% tolerance and provided more fuel than the volume that appears on the pump. The APA has filed a class action lawsuit because of this.

I’d be interested to see the study that looks at the accuracy of the measurement that measures the accuracy of the gas pumps.

It’s misleading to talk about “accurate” versus “inaccurate” measurements. Accuracy is a quantifiable characteristic of any measurement system. It is essential to know what it is before using that measurement system. The opinion polls quote the accuracy of their polling method when they say “plus or minus 3%, 19 times out of 20.”

Perhaps a better way to regulate gas pumps would be to require owners to get an accuracy test on their pumps ever few years and then post a sticker that reads something like “this pump is accurate to within plus or minus 0.5%, 19 times out of 20.” Another pump might read “this pump is accurate to within plus or minus 0.2%, 19 times out of 20.”  Then leave it to the consumer to decide.

keep looking »


  • Buy Your Copy

    On sale Feb 29, 2012

    Price $48

    Buy from Barnes and Noble

    (In stock now).

    Buy from Praeger (Publisher).

    (In stock now)

    Buy Amazon US

    Buy from Amazon.com
    In stock.

    Buy Amazon Canada

    Buy from Amazon.ca
    Ships in 2-5 weeks

    Read the first two chapters: Click here.

  • Tag Cloud

    advsertising baseball compliance cooking data credit rating crime rate customer complaints customer service dashboard debt definitions deflation economic activity election exception reporting experimentation Facebook fashion industry fraud gas pump GDP government statistics happiness investing in measurement jobs key performance indicator medical studies on-line p-value poverty privacy probability productivity public health quality of life radar randomization risk safety science six sigma stimulus unemployment unmeasurable website traffic
  • Archives