Blog based on the ideas in the book misLeading Indicators
Brilliant propaganda, but lousy indicator of rate of growth of US debt.
Posted on | February 6, 2012 | No Comments
There is a little graphic being circulated around the Internet intended to show that President Obama has recklessly doubled the total US debt accumulated since President Washington.
As a piece of political propaganda it is brilliant. It implies that President Obama’s administration is worse than all previous administrations when it comes to debt. Is it? As a performance “indicator” it is not so good.
The US debt is growing exponentially, and has been for 40 years under all presidents. Think of a pond with exponential growth of lily pads. Every week the number of lily pads growing in it doubles. How full was the pond one week before it was completely full of lily pads? Half full, and it took many weeks to get there, just like the US debt took many presidents to get to the point where it had half as much debt as today. A better indicator of exponential growth is not how long ago there was half as much, but the length of the doubling period.
To account for increasing population, it’s better to measure how profligately US governments have been going into debt by calculating debt per capita rather than gross debt. In September 2011, US debt per capita was $47,652. It took 8 years, from 2003 to 2011, for debt/capita to double from $23,826. So the graphic is superficially correct. (It would also be better to account for inflation and use constant dollars. The trouble is; what indicator of inflation should be used? See here and here and here and here.)
The chart below shows the terrifying picture of US debt/capita since 1791.
The huge debt figures at the end of the 20th century completely mask what was going on in the 19th. Debt per capita actually declined most years, with the exception of the period during the Civil War (see below).
The 20th century is a different picture all together, as seen in the chart below. The scale shows powers of two, to make it easier to see how long it takes it to double debt/capita. There were two jumps in World War I and II. Between and immediately after the wars debt per capita was flat or declined. Then around 1970, debt per capita took off.
The chart below works backwards, starting with the current debt/capita of $47,652, and then downwards to show the year at which the debt per capita was one half this amount, one quarter and so on. Each time the debt/capita line crosses the dotted horizontal lines, debt has doubled (going left to right) or halved (going right to left).
From 1976 to 2011, debt per capita grew 16 times—that means it has doubled 4 times (from $2,978 to $47,652). That’s a compound average growth of 8.2% in debt per capita.
At 8% growth, debt per capita will keep doubling every eight or nine years or so. That’s how long it took to double it to President Obama’s number from George W. Bush’s number in 2003. All he is doing is holding the shovel steady as the US inexorably digs a deeper hole.
(Note: to calculate debt per capita I linearly interpolated US population between censuses, which are taken every ten years. Source of debt data: http://www.treasurydirect.gov).
© 2012 Greenbridge Management Inc.
A leading indicator of business cycles that have already happened
Posted on | January 30, 2012 | No Comments
A “leading indicator” is supposed to forecast, or at least to help the person using it, to make a forecast.
The Conference Board recently announced that it was changing its Leading Economic Indicator to address structural changes that have occurred in the US economy in the last few decades. The new indicator was released on January 26, 2012.
In its press release, the Conference Board says “Revised figures show that adding the new Leading Credit Index™, in conjunction with other changes, makes the LEI a more accurate predictor of U.S. business cycles since 1990.”
A better predictor of business cycles since 1990? They have already happened. That’s an interesting twist on economist Paul Samuelson’s famous 1966 quip that “Wall Street indexes predicted nine out of the last five recessions.” Now economists predict them after they happen. As physicist Niels Bohr said “prediction is very difficult, especially about the future.” It would be more convincing (to put it mildly) if they had revised the LEI before the last recession and made a prediction that turned out to be true. That would show that the indicator had some ability to “lead”.
The Leading Economic Indicator (LEI) is an index that combines ten indicators, including average weekly hours of manufacturing, average weekly unemployment insurance, new orders, building permits, stock prices and others.
In the new index there are changes such as removing the inflation-adjusted money supply and replacing it with a new Leading Credit Index. According to Kathleen Bostjancic, director of macro analysis at the Conference Board, the old index was being “skewed” by the money supply.
This blog post by Doug Short has a series of charts that compare the old and new indices from 1959 to today. I’ve shown one of them below (the red line is the old LEI and the blue line is the new one). By superimposing the two indicators on the same chart, Short showed that the old and new indicators crossed paths in 1994, 2001 and 2008. The rest of the time they moved roughly in parallel, except in recessions. The difference between the two indicators is most dramatic before, during and after the 2008-2009 recession. The new LEI shows a much more dramatic drop in economic activity, and a much slower recovery.
Only time and experience will show whether the new LEI will be a better tool for forecasting economic activity than the previous one. In the meantime, it is worth remembering an old lesson from some of the greatest scientists—just because you can obtain numbers from measuring does not mean the thing you think you are measuring actually exists.
And just because it is called a Leading Indicator does not mean it is.
Hoaxing, forging, trimming, faking, cooking and falsifying measurements
Posted on | January 23, 2012 | No Comments
There have been several recent high-profile cases of scientists hoaxing, forging, trimming, cooking or falsifying their data. These cases occurred in spite of the fact that scientific publications must be peer-reviewed before publication, and that scientific findings must be replicated before they are confirmed.
Would the safeguards in place in most business catch similar acts?
In 2007 Harvard University seized computers, notes, videotapes and other materials from the lab of Marc Hauser, a professor of psychology. The university found him guilty of scientific misconduct. In 2011, Tilburg University in the Netherlands, after an extensive investigation, found that Diederik Stapel, a professor of social psychology, had fabricated data for 15 to 20 years.
In Hauser’s case, the fraud came to light when researchers in his lab viewed video tapes Hauser had taken of chimpanzees and found that Hauser’s coding of the results was not consistent with their own observations. Whistleblowers also alerted the authorities to Stapel’s fraud.
The failure to replicate scientific results does not in itself point to fraud. When scientists make observations in nature, rather than the lab, they can be very hard to replicate because the precise conditions under which they made the observations may be unique. Sometimes tremendous scientific resources are spent collecting data, making it hard for others to repeat an experiment. Sometimes the data take so long to collect—such as in long term epidemiological studies—that it would take too long to replicate them. Scientists often use lab animals, but there may be genetic or environmental differences between populations in one lab to the next that make replication difficult.
There are fortunes to be made on any drug that extends life. There have been many studies on the compound resveratrol, which comes from red wine. It is thought that it may prolong life. Some scientists have found that it extends life in simple creatures such as yeast and worms, and others have found that it does not. Does this mean there is fraud involved? Not according to one of the scientists, Heidi Tissenbaum from University of Massachusetts medical school, who said “nobody’s falsifying; people are just getting different results. “
All of these difficulties in detecting fake measurements exist in business. Procedures such as calibration check that instruments are working, but don’t catch fabrication or cooking of data. Many business measurements, such as marketing studies or experiments in factories, are like the field observations of scientists in nature: the exact conditions are hard to replicate. People regularly “trim” out measurements that do not look right, even though they may be indicating a problem.
There are no simple tricks for catching such problems with data. But a good starting point is to develop a healthy suspicion for all numbers.
© 2011 Greenbridge Management Inc.
Tags: cooking data > fraud > science
How many people visit your website?
Posted on | December 23, 2011 | Comments Off
Whether you are writing a blog like this one, or run a major e-commerce website, it’s natural to want to know how many people visit.
There are lots of tools—generally known as web analytics—to measure web site traffic. The problem is, “People don’t visit websites. Their computers do.” But you cannot identify how many people are hiding behind a single computer. Every person in the Ford Corporation has the same IP (Internet Protocol) address. And most home users are given a different IP address every time they get on the web. So it could look like you are getting more “visits” from a single home user than from a dozen Ford employees. And, as in the photo above, you don’t know how many people are looking at the same computer screen.
Measuring website visits depends on the definition of a “visit.” The Joint Industry Committee for Web Standards says a single visit is a series of page requests with no gaps greater than 30 minutes. So if you click on the link for a page 31 minutes after you first landed on the site, it’s counted as a new visit, even if you spend most of your first 30 minutes getting a coffee and reading the page. (Click here for their stuff on measuring website traffic).
This article does a nice job of explaining the challenges of measuring web traffic.
© 2011 Greenbridge Management Inc.
Are Americans getting wealthier or poorer? It depends on how you measure “wealth.”
Posted on | December 23, 2011 | Comments Off
The standard measure of wealth is GDP per capita. The chart below shows that Americans have been getting continually wealthier for decades, with a few blips here and there (source of data). The measure of wealth—Gross Domestic Product, is based on the dollar value of economic transactions.
Such a measure depends crucially on the definition of wealth. People have a bad habit of confusing financial assets with real ones because they are confused about the nature of wealth.
This striking newsletter from Stansberry’s Investment Advisory claims Americans have been getting poorer since 1965. The problem with using the dollar as a measure of wealth is that “we don’t have a sound currency with which to measure GDP through time.” Instead of using dollars, Stansberry’s used a basket of commodities. They used real assets instead of financial ones.
Their trend line goes in exactly the opposite direction from the one in the chart above. Stansberry’s says Americans are “faking” wealth by going deeply into debt.
© 2011 Greenbridge Management Inc.
Tags: GDP > government statistics > wealth
Dead right.
Posted on | December 5, 2011 | Comments Off
I snapped this picture at a cross walk in Manhattan near Central Park over the weekend. What does it say? Stop or walk? You could walk and not, technically, be jaywalking. You stand a good chance of getting smacked by a car though if you do. You’d be technically right. And maybe even dead right.
Misleading indeed, but at least in this case it is easy to tell it is misleading. It is not usually so evident when your indicator is a measured quantity.
Studies show studies don’t show
Posted on | November 29, 2011 | Comments Off
It’s used in medical research, engineering research, and just about every other form of research. It determines whether research is “significant” or destined for the garbage bin. And it is one of the most misleading indicators ever developed. It is known as the “p-value.”
The p-value is the number that comes out of most orthodox statistical tests such as t-tests, analysis of variance, regression and many others. It is supposed to tell you the probability that you are mistakenly concluding that you have found an effect when actually there is not one (a “false positive”). So of course researchers seek small p-values
such as 5% or 1%.
Although that’s what people think that probability is, it is not. I’ll get to that. But even if it were, the way researchers misapply it means that they draw the wrong conclusions.
A landmark article by John Ioannidis claimed that most published research findings are false. The reason is (basically) that there are too many scientists chasing too many significant results in areas where there are small effects, that researchers tinker with experimental designs and mine data in search of significance, and that there are financial or other incentives to get significant results published. Ioannidis also said that “many research findings are simply accurate measures of prevailing bias.” Ioannidis showed mathematically how this occurs.
Other studies have shown similar results. A recent analysis of studies in neuroscience found that about half of them used the wrong statistical procedure. Suppose a researcher finds that 30% of mutant nerve cells respond to a chemical, and gets a significant p-value. Then the researcher finds that only 15% of normal cells respond, and this result does not generate a significant p-value. With this second result, the first result becomes meaningless. To be able to say that the chemical has a significant effect on mutant cells you would have to show that the 15% difference between mutant and normal cells was significant.
What is the meaning of the p-value? What the p-value gives is the probability of getting the observed data by chance given that there is no effect. But that is not the probability we should be interested in. The probability of getting the data when there is no effect is not very relevant. The probability we need to know is the probability that there is an effect, given the actual data we just got. Those are very different probabilities, but most researchers—knowingly or unknowingly—present them as though they were the same.
© 2011 Greenbridge Management Inc.
misLeading Indicator suggests world exporting to aliens
Posted on | November 29, 2011 | Comments Off
When I do calculations on a performance indicator, I usually do the calculation more than one way. This gives me a good check on my method, and gives me assurance that the indicator is meaningful if both calculations match. In some cases this task is simplified if there is some constraint that must be met, such as all the numbers being forced to add up to zero or 100%.
So it should be with world current account deficits. They should add up to zero.
The Economist Magazine added up the current account deficits published by the International Monetary Fund and got $331 billion, causing it to quip that the aliens must be “buying Louis Vuitton handbags.” (See here).
The root of the problem is mis-measurement of the current account deficit. The Economist reports that these measurement errors have jumped.
© 2011 Greenbridge Management Inc.
Why your customers may not see things the way you do
Posted on | October 28, 2011 | Comments Off
Are your customers experiencing something completely different from what your indicators tell you they are experiencing? Probably. Suppose a retail company with branches of different sizes measured some aspect of customer service in each of its branches, say wait times. Will the average wait time be a good representation of the average customer’s experience?
Take a company with three branches. The average wait times from each branch are 8, 5 and 2 minutes, for an overall average of 5 minutes. This is the number the company might publicize. It might even offer a service guarantee of five minutes.
But let’s look at it from the perspective of the customers waiting in line. The first branch serves 1,000 customers every day, the second serves 250, and the third serves 100. There thus are 1,000 people who experience an average 8 minute wait, 250 who experience a 5 minute weight and 250 who experience 2 minutes. If we weight the average wait times in each branch by the number of customers, the overall average jumps from 5 minutes to 7. This more accurately reflects the experience of customers.
That could be an expensive service guarantee! Be careful of averages that do not represent the experiences of individuals.
© 2011 Greenbridge Management Inc.
Tags: customer complaints > customer service > key performance indicator
Moneyball shows that you should challenge thinking about KPIs
Posted on | October 14, 2011 | Comments Off
Business writers often speak of “measures that drive future performance.” The trick is figuring out what measures those are.
Doctors have been measuring their patients since the Italian doctor Santorio Santorio invented a scale for the Galileo thermometer in 1612. But patients can (and do) die soon after they get a clean bill of health, even when all the “drivers of future performance” indicate that they are healthy. Despite huge efforts, medical science has still not figured out all the drivers of future performance.
The same is true in many fields, including business. People too easily fall into the trap of assuming that some indicator is a driver of performance, or a “leading indicator,” out of habit, or wishful thinking.
The movie Moneyball shows how easy it is to fall in love with such presumed drivers of performance, and why it is worth challenging them, because they may be misleading indicators. Billy Beane, General Manager of the Oakland Athletics, “discounted what scouts have done for 150 years.” He rejected previous baseball key performance indicators such as stolen bases and batting averages in favor of others such as on-base percentage and slugging percentage.
Using such measures Beane was able to find players at far cheaper salaries. His 2002 draft put the new techniques to the test. Did they work? It depends on your measure of success. You can judge the results for yourself in the chart below, which uses wins versus losses as a measure of success. At the very least, the A’s were competitive with teams that had triple the payroll.
This suggests they may also have been successful on another important measure of success: their profitability.
What polls, police radars and the Hawthorne experiments have in common
Posted on | September 29, 2011 | Comments Off
There is an election campaign in my home province of Ontario. As often happens during elections, somebody raises the issue about whether public opinion polls influence the outcomes of elections (such as in this article).
If only measurements involving people could produce objective information. People and the way they react to measurements make this very hard to achieve.
When you see a police car sitting at the side of the highway, with a radar gun pointing at you, you will most likely break or ease off the gas. The awareness of being measured affects your velocity and position. The observer affects the observed when people measure people. Even the fear of a hidden policeman measuring your speed will cause most drivers to moderate their speed—and increase their stress.
The Hawthorne experiments showed that measurement in organizational or social settings is itself a source of perturbation. In addition to providing feedback, the act of measuring also changes people’s behavior by stating the intentions and priorities of the measurer. It thus implies the measurer’s values. When a CEO rigorously measures variance from budget, people understand that sticking to budget is important, and act accordingly.
The Hawthorne effect is named after a series of experiments lasting several years between 1924 and 1932 from the Hawthorne plant, near Chicago, owned by the Western Electric Company. The plant manufactured relays for the Bell Telephone Company.
Researchers noticed improvements in worker productivity when the women working in the plant were given a variety of different working conditions, such as changing the lighting, and their output was measured. The folklore that developed from these experiments, and that persists to this day, is that the performance of people who are singled out in a study of any kind improve, not because of the parameters or variables of the study under study, but because they are pleased with the attention they receive.
H. McIlvaine Parsons showed this folklore to be bunk in a study in Science magazine in 1974. What the Hawthorne experiments actually showed was that the productivity did not change because the lights were made brighter or dimmer. The productivity changed when the subjects of the study received measurements of their own productivity.
In social situations, measurement itself causes change in behavior and performance. It cannot be just a means to objectively measure performance.
Do opinion polls affect the outcomes of elections? You betcha.
© 2011 Greenbridge Management Inc.
The Operational Dashboard that had no Key Performance Indicators
Posted on | September 25, 2011 | Comments Off
The “dashboard” is often used as a metaphor by business writers and consultants to explain to managers that they need performance measurement and key performance indicators for their business, just like they do for their cars. Otherwise they’ll drive their business off the road, they (falsely) argue.
The problem is, the dashboards on cars don’t help you keep your car on the road. Yesterday a couple came on my street driving a 1931 Model A Ford. I snapped this picture of the dashboard.
It has three instruments: an Amp meter to tell the charge on the battery; a gas gauge; and a speedometer. That’s it. That’s all. They are at indicators to help you manage the risks of running out of gas, battery charge, or going too fast. You could get by without them.
The dashboard on the earlier Model T Fords had no instruments. It was a board. The Amp meter was the first the be added to the board when starter cranks were replaced with starter motors powered by electric batteries.
What you need to keep your car on the road is a good pair of eyes. Measurement cannot tell you everything you need to know. There is a clear and vital role for non-measurable information. If you just relied on the indicators on a dashboard to drive a car you would go off the road.
The same lesson applies to running business.Non-measurable information is vital.
The importance of structural information
Posted on | September 20, 2011 | Comments Off
Business writers and consultants today typically advise managers to calculate indices and percentages to measure the state of their enterprises. For example, Kaplan and Norton, in the “Balanced Scorecard”, suggest measuring “strategic information availability” with the “percentage of processes with real-time quality, cycle time, and cost feedback available” and “percentage of customer-facing employees having on-line access to information about customers.” Meyer and Ross, authors in Peter Senge’s collaboration “The Dance of Change,” advise readers to develop similar measures, to strive for simplicity, and to display them on an “Operational Dashboard.”
Many management writers advise their readers to limit the number of things they measure to some arbitrarily small number. Meyer and Ross, for example, state “If you could only track six or eight things, what would they be? The point is to avoid devising so many numbers and gauges that your dashboard looks like a 747 cockpit. Otherwise you’ll spend all your time looking at the dashboard and forget to ‘look out the windshield’—to actually implement the work.”
I doubt many would disagree with their advice to “look out of the windshield,” but it is not very helpful. As for the rest, you should measure what you need to measure to prompt timely and specific action, while preserving the structural integrity of the data. Every instrument on the 747 is there to help fly the plane. The instruments prompt the pilots to take specific actions. There is virtue in simplicity, but the number of things you should measure is dictated by the structure of the information and the actions that need to be taken, not by simplicity for simplicity’s sake.
Simplicity is of course a desirable attribute of an information display. Many “operational dashboards”, however, use a lot of ink and space to display very little. (Go to Google images and search for “business operational dashboard” for many examples.)
Many of the displays in these dashboards can be almost totally useless. They only give broad measures of progress because they throw out structural information. Without structural information, it is hard to know what corrective action to take. Furthermore, displaying the information in the form of dials makes the displays unnecessarily complicated.
Edward Tufte said that graphical displays should “show the data; present many numbers in a small space; avoid distorting what the data have to say; induce the viewer to think about substance; reveal the data at several levels of detail, from a broad overview to the fine structure.” He says data displays should reveal the complex.”
Unfortunately, many dashboards don’t reveal it but conceal it, because they crunch numbers into single indices.
For example, a performance indicator of inventory turns for a consumer products manufacturer may suggest to management that there is too much inventory. “Twelve inventory turns per year” is how it is often reported. This gives the false impression that all inventory turns at the same rate. It ignores the structural information. There is not an equal amount of inventory for every product. Nor is there an equal proportion of inventory for every unit of sales. Capturing, displaying and acting on this structural information takes more effort than calculating overall averages and observing the overall direction they indicate. It also leads to greater improvements.
Both a logistics manager and a sales manager should know, for example, that the two percent of products that accounted for the top twenty percent of sales last year are currently taking up twenty-eight percent of the inventory, and are thus turning relatively quickly. They might also find it useful to know that more than half their current product offerings account for less than 20% of sales, taking up 10% of inventory space. This information can prompt several specific actions: changing product mix, getting rid of slow-moving products, sourcing raw material differently or changing manufacturing strategy based on the relative speeds at which various products turn.
You may be able to calculate indices in a way that makes them mathematically valid, but by throwing out the structure of the data, they are not “structurally valid.” Peter Drucker, in his classic book “Management” wrote in 1973 “to enable controls to give the right vision and to become the grounds for effective action, the measurement must also be appropriate. Thus, it must present the events measured in structurally true form. Formal validity is not enough.”
Yet the advice has not been heeded, and to a large extent, our creative and data crunching ability with computers has made it worse.
Can you make sense of on-line media metrics?
Posted on | August 10, 2011 | Comments Off
On-line advertizing has grown immensely over the last few years. One estimate pegged growth from $55 million in 1995 to $54 billion in 2009, an annual growth rate of 63%. It’s not surprising then that there is some interest in trying to measure on-line advertizing.
If an advertiser pays for a thousand ad “impressions” with Google, is it getting the same exposure as if they pay for a thousand impressions with Yahoo or MSN? It’s a much trickier problem than measuring print or TV advertizing. For example, in TV advertizing there are ways to estimate the number of viewers, and ads are usually bought for defined spots, such as 15 seconds or 30 seconds. On the internet, what the server coughs up for a viewer to see in his browser isn’t necessarily what the viewer sees, and the number of ways to display an ad on a web page is limited only by the imagination.
To deal with the problem, several advertising organizations (Interactive Advertising Bureau, the Association of National Advertisers and the 4A’s) joined forces to find a way to “develop digital metrics and cross platform measurement solutions.” Their initiative is called “Make Measurement Make Sense.”
The challenge they are going to have is to define what they are measuring and what the metrics are use for. Their website highlights five principles. There should be a defined standard for a “viewable impression.” Impressions should not be just what the server serves, but what the audience sees. Ad units should be standardized. Metrics should be relevant for brand marketers. And digital media measurement should be comparable with other media.
I have a little experiment I do in my management seminars. I ask the participants to count the number of light bulbs in the room. Never, in nearly twenty years, have any single group of participants agreed on how many bulbs are in the room. There are two reasons. They don’t know have enough background information to understand the relevance of the number, and, because of that, they don’t all use the same definitions of light bulbs. Some people count round incandescent bulbs as light bulbs and neon tubes and even TVs. Others count everything that emits light—even computer screens—and some only count incandescent bulbs.
If I say “I need to know the number of incandescent bulbs in the room so I can replace them with compact fluorescent bulbs” I would get more consistent answers than if I ask “how many light bulbs are in the room?”
There are probably a lot more ways to define an on-line ad and an impression of that ad than there are ways to illuminate a room with something called a bulb. The advertisers have their work cut out for them. Whether they can make sense of it all remains to be seen.
Investing in measurement usually pays off
Posted on | July 31, 2011 | Comments Off
A headline in British journal “The Engineer” caught my eye a few weeks ago. It said that the British government has announced a £240 million (US $393 million) investment in measurement, with the aim of improving measurement techniques and technology to stimulate innovation in productions, processes and services.
The heart of the scientific process is measurement, for without a way to measure the property or phenomenon you are studying you cannot learn much about it. Scientists spend a tremendous amount of time and effort developing and testing new ways to measure and refining old ones. Often when they do they lead to tremendous breakthroughs and whole new industries.
For example, in the late 19th century, British and German scientists and politicians argued over two competing methods for calibrating instruments for measuring electrical resistance. Precise measurements, and thus calibrations, of electrical resistance were necessary to find faults in the telegraph cables, especially undersea cables. A single fault can render a cable that is thousands of miles long completely useless. The manufacture and laying of cables was a new and growing business at the time. The protagonists included Lord Kelvin and the German engineer Werner von Siemens (1816-1892). In 1847 Siemens and Johann Halske founded Telegraphen-Bauanstadt von Siemens & Halske, which grew into the modern multinational Siemens.
Siemens had invented a device that used mercury to calibrate electrical resistance measurements. In his autobiography he wrote that “the scales of resistance with the mercury unit, prepared by my firm, proved extremely useful in laying the cable from Suez to Aden, and for the first time made reliable determinations of fault possible.” He was able to pinpoint faults within meters, and then fish up the cable so that it could be repaired and transmit telegraphs. Siemens won contracts to lay thousands of miles of telegraph cables. Siemens’ method, with some modifications, became the standard in the 1884 International Telegraph Conference.
The success of Siemens in its early days was in no small part due to Werner von Siemen’s instrument for measuring electrical resistance.
Businesses can also reap benefits of investing in measurement. Measurement is at the heart of many decisions, and is essential for controlling or managing processes that can range from manufacturing to website traffic. It is often worth investing in new and refined ways to measure. I have usually seen the benefits far outweigh the costs of the effort.
Tags: investing in measurement > science
Gas pumps inaccurate, study shows. So are all measurements, experience shows.
Posted on | June 30, 2011 | Comments Off
As Canadians and Americans fill up their tanks for the July 1st and 4th long weekend they can contemplate this piece by CBC news that tells us that inaccurate gas pumps are short-changing Canadians.
It is impossible to measure anything in a way that is consistently accurate. There are only ways to measure that are accurate to within acceptable tolerances.
I tried to find out what the tolerances are for gas pumps in Canada. Usually tolerances are the same internationally to facilitate trade. This study (page 105) quotes the US National Institute of Standards and Technology tolerances of 0.5% for the type of pumps consumers use.
This means that multiple measurements must be within the 0.5% of the value that appears on the pump. Furthermore, the difference between largest and smallest of these measurements must be less than 40% of the tolerance value, or 0.12%. The first condition makes sure all measurements are within a tolerable value of the displayed value, and the second makes sure they are close to each other.
This means you could have a pump that legally shortchanged consumers by, say, an average of 0.4%. If it costs $75 to fill your tank, that’s $0.30 every time you fill it. If you fill it once a week, that’s about $16 per year. It’s up to you to decide if you can tolerate this legal shortchanging. You could also have another gas pump that legally gives you 0.4% free gas. It’s up to you to decide if you can tolerate this legal giveaway.
This balancing act between short changing, giveaways and legal tolerances gives companies a strong incentive to use measurement devices that are very accurate. It would be easy to avoid being penalized for shortchanging by erring on the side of giving away. But that costs money. Better to make your pump more accurate.
The Automobile Protection Association quotes a Measurement Canada study that says between 1999 and 2007 4.9% of all gas pumps exceeded the 0.5% tolerance and provided less fuel than the volume that appears on the pump. What it does not say is by how much. Nor does it say how many gas pumps exceeded the 0.5% tolerance and provided more fuel than the volume that appears on the pump. The APA has filed a class action lawsuit because of this.
I’d be interested to see the study that looks at the accuracy of the measurement that measures the accuracy of the gas pumps.
It’s misleading to talk about “accurate” versus “inaccurate” measurements. Accuracy is a quantifiable characteristic of any measurement system. It is essential to know what it is before using that measurement system. The opinion polls quote the accuracy of their polling method when they say “plus or minus 3%, 19 times out of 20.”
Perhaps a better way to regulate gas pumps would be to require owners to get an accuracy test on their pumps ever few years and then post a sticker that reads something like “this pump is accurate to within plus or minus 0.5%, 19 times out of 20.” Another pump might read “this pump is accurate to within plus or minus 0.2%, 19 times out of 20.” Then leave it to the consumer to decide.
The definitions of performance indicators are critical: the case of unemployment.
Posted on | June 9, 2011 | Comments Off
The US Bureau of Labor statistics calculates inflation several different ways. The mostly widely reported is U-3, and what people generally call the “unemployment rate.” U-6 has a broader definition and includes discouraged workers and those that work part time because they cannot find full time work.
In the 1930’s depression many workers were given “work-relief” jobs. Were they employed or unemployed? The official numbers counted them as employed. Robert Margo, a professor of economics at Vanderbilt University, compared two versions of the depression unemployment numbers, one which counted them as unemployed (by Lebergott), which seems quite reasonable, and the other as employed (by Darby). Lebergott’s unemployment rate was greater than Darby’s rate every year from 1930 to 1940, sometimes by more than 5%. (See here or here)
A similar discrepancy exists today. According to John Williams at www.shadowstats.com, the U-6 rate does not count long-term discouraged workers. When they are added to the calculation, the unemployment rate is about 5% higher than the U-6 rate.
The striking thing is that the Shadowstat’s number for today is about as high as Lebergott’s depression era unemployment rate in the 1930′s (see below. Chart Courtesy of ShadowStats.com.

What is the “correct” measure of unemployment? We can debate endlessly how best to define and measure it. The lesson is that you must understand how an indicator is defined is you are going to make any sense of it. The definition is critical, as are the motivations behind it.
Tags: definitions > government statistics > unemployment
Accident stats don’t tell you much about safety
Posted on | May 29, 2011 | Comments Off
On April 9, 1992 the Canadian Institute of Mining, Metallurgy and Petroleum awarded the Westray Mine the coveted John. T. Ryan Award. The Institute grants the award every year to the mine that had the lowest accident frequency per 200,000 hours worked during the previous calendar year in Canada.
On May 9, 1992, the mine exploded killing twenty-six miners. Why did it explode despite having a statistical record as the safest mine? The tragic explosion graphically illustrates the misleading nature of measurements that are claimed to measure risk.
Many people believe that you can “measure” probability using the frequency of past events. But you can no more measure probability that you can assign frequency.
People that believe this believe you can measure the probability that a coin will turn up heads by flipping it many times and measuring the proportion of times you get heads. To believe this falsehood you need to believe that the probability that a coin will turn up heads when flipped is a physical property of the coin, such as its weight or mass, the circumstances of the experiment, and the nebulous concept of “randomness”.
The late physicist E. T. Jaynes says such a belief requires utter contempt for the known laws of physics. It is based on the premise that there are random events that are presumably, and rather bafflingly, without cause.
Running a business and managing risk based on a theory that is contemptuous of the laws of physics is not smart.
Suppose you ask me to flip a coin 100 times so you can “measure” the probability of heads. I place the coin, heads down, on my thumb, which is right on a table top. I gently flip it so it flips once and lands heads up. I repeat this 100 times and get 100 heads.
You object—“that is not random enough to measure the probability”. But how high must the coin be tossed and how fast it should spin before that mysterious ‘randomness’ appears in the experiment for such a presumed physical probability to be measurable? No one has an answer to that question.
The probability of getting heads expresses what you know about your ability to get heads—in other words, your degree of control of the coin flipping process. If you know nothing, it is reasonable to assign—not measure—a probability of one half to getting heads. But if you have a high degree of control over coin flipping, something you could only obtain with a lot of background information, you would assign a very high probability to getting heads.
Likewise with safety. The workers in the Westray mine knew that it was very dangerous, and that there was a high risk of explosion. Methane gas was leaking. Someone had tinkered with a methanometer, a safety device that shuts down equipment when methane gets to explosive concentrations. Management emphasized that production was more important than safety.
Those workers knew all that, and had assigned a very high probability that the mine would explode. They would have completely disregarded the safety statistics on which the John T. Ryan award in making that assignment.
Tags: background information > randomization > risk > safety
Sometimes polls mislead, but that’s life.
Posted on | May 18, 2011 | Comments Off
Opinion polling is used in business for market research. Can businesses rely on such polling numbers? Or do they mislead? The answer is yes to both questions. To understand why, it is important to understand what the word “error” means, and the difference between sampling error and inferential error.
In the 1948 presidential election, pollsters predicted Dewey would win, but Truman beat him by 5%. Pollsters have made a lot of improvements to their polling methods since then, but challenges remain.
Canadians may have been somewhat surprised recently when a majority government was elected on May 2 with 39.6% of the vote. This was above all final polling results. The average of the major polling companies was 37.1%, or 2.5% lower. Most polling companies had a ±3% sampling error, 19 times out of 20. (The combined sampling error of the average of the four companies would be much smaller than this.)
The different possible samples that can be drawn from the population with a particular sampling method is the source of what is called sampling error. It does not mean a mistake has been made. It means there is some uncertainty due to the luck (or bad luck) of who (or what) gets included in the sample.
Suppose a survey has a ±3% sampling error, 19 times out of 20. Once the survey has been completed and the data collected and analyzed, we cannot say whether estimate from this specific sample is within 3% of the value in the population. Is it one of the 19 or the 20th? The reason is that you rarely get a sample that is perfectly representative of the population as a whole. A particular sample might be one percent off, or six percent off. It is usually very hard, if not impossible, to know. If it is off, and you think it isn’t, you have an inferential error. The sampling error is the error of the method for future surveys, not a particular survey. It states the probability that a future survey will be within 3%. But once you have the data in hand, that is a very different probability than the probability that the survey you just took is within 3%. If it is not, you have an inferential error. Again, this does not mean a mistake has been made. It’s an inherent limitation of sampling: sometimes inferential errors will mislead you.
This means that before you trust polling results you need to determine whether you trust the pollster’s sampling and surveying method. The methods, and its track record, are crucial. Pollsters refine their methods over time, and each pollster has its own method, although of course there are many similarities.
Opinion surveys on voting intentions have one advantage over market research surveys: they can be calibrated with actual voting results. This advantage makes helps pollsters refine their sampling methods for greater and greater precision.
Market research has an advantage over political opinion polling, too. Elections occur on a single day, and campaigns make tremendous efforts to swing voting intentions right up to the last minute. This makes voting intentions much more volatile than purchasing behavior. Purchasing behavior changes more slowly, making it easier to catch trends.
The lesson for business is this: before you trust sampling and polling results, make sure the person or firm doing it has a well-refined sampling method and a good track record.
Will anybody ever trust inflation measurements?
Posted on | April 30, 2011 | Comments Off
About 200 years ago Joseph Lowe said that “the interest of government, the greatest of all debtors, [is] to prevent the public from fixing its attention on the gradual depreciation of money.” It seems like the Argentinian government is doing its best to prove Lowe right. Recent reports (see here and here) suggest the Argentinian government is fiddling its inflation numbers. The official number is 10%, private economists peg it closer to 25%.
Arguments about how to measure inflation have been going on for centuries. Between 1919 and 2003, the Bureau of Labor Statistics comprehensively updated the way it calculated the Consumer Price Index six times. It made smaller changes to its methods more regularly.
Bill Gross, who manages $1 trillion for PIMCO, said that ”CPI numbers [are] not reflecting reality at the checkout counter,” and talks about the “total fiction that is government reporting of inflation.” John Williams, an economist, publishes his own measure of inflation in the US, just like ecenominsts in Argentina. Hie measure pegs it about 7% higher than official numbers.
Econominsts at the Bureau of Labout Statistics say that “apparent in¬consistencies between the index and people’s perceptions” are based on “misconceptions and myths.”
The controversy could be easily resolved if there were some objective way to measure inflation that could be calibrated, like you can calibrate a thermometer. But you can’t. In common usage, inflation is the Consumer Price Index-the thing we want to measure. At the same time, it is the instrument we use to measure it. The two are combined into one. The CPI is being constantly redefined to meet changing circumstances. It is as if both the definition of temperature, and the instruments we use to measure it, were in a constant state of flux–so there will always be doubters.
keep looking »















