Misinterpreted mismeasures

In my first essay Things that grow I argued that the GDP aggregate is an artificial construct with no clear correspondence to the real world. When GDP is calculated, goods and services sold on a market are aggregated with goods and services produced outside of any markets. This aggregation is supposedly justified by the assumption that their values are commensurable. I attempted to show what a specious assumption that is.

The definition of GDP was my main concern in that essay because it’s pointless to speculate on the value or true meaning of the GDP measure if it isn’t an accurate measure of anything at all. But since GDP is nevertheless often interpreted as a happiness or welfare metric by misinformed economists, it might be worthwhile to take a closer look at that misinterpretation. I still don’t see GDP as a well-defined statistical quantity, but in this essay I want to focus on the usage of statistical aggregates, not on their theoretical justifications. And as I wrote in Things that grow, justifications are often dictated by usage, not the other way around. That’s what makes GDP such an incongruous quantity in the first place.

What is it about the GDP aggregate that has given it such an enormous popularity in politics and governance? What needs has it met so much better than other statistical aggregates? We find a simple answer in a recent book published by a commission which was set up to conduct a critical review of GDP and its alternatives. When speaking of these alternatives, the book Mismeasuring our lives states:

“As communication instruments, one frequent criticism is that they lack what has made GDP a success: the powerful attraction of a single headline figure allowing simple comparisons of socioeconomic performance over time or across countries” (SSF 2010 p.102)

The key word is “simple”. I already pointed out in Things that grow that the validity of the GDP aggregate doesn’t seem to be a point of concern for the people who use it most. Our daily experience with statements from politicians, businessmen, bureaucrats and journalists affirms that they value its simplicity and purported generality. No deeper knowledge of statistics or even macroeconomics is needed when the common consensus is that this one number summarizes the important aspects of the national economy.

In principle this makes sense. Politicians run the government and let statisticians worry about the empirical facts behind the numbers. We wouldn’t expect anything else. But we saw in Things that grow that simplicity is a double-edged sword. On the one hand simple numbers are a prerequisite for effective political debate. If every allusion to empirical data would always be minutely scrutinized from opposing perspectives, the discussion would never end. But on the other hand simplification is always misrepresentation. So we also have to ask ourselves whether it makes sense to make political decisions based on badly distorted information.

In this essay I will discuss some of the supplements and amendments to GDP proposed in the book Mismeasuring our lives. The point I will return to time and again is the one I outlined above. Is simplification justified? In general I will be critical of the statistical measures proposed in the report. In the end I think that a change from a situation A, where decisions are made based mostly based on one distorted measure, to situation B, where decisions would be made based on five or six equally distorted measures, does not constitute an obvious improvement.

The agenda of Mismeasuring our lives is simple: the authors and their associates have thought on the one hand about amendments to the classical GDP aggregate and on the other about new statistical aggregates. These new aggregates fall into two broad categories, those that measure environmental sustainability and those that measure quality of life. I shall focus on these two and leave the classical GDP issues aside, since I already dealt with GDP at length in Things that grow.

The thoughts presented in Mismeasuring our lives originate in certain alleged and widely cited shortcomings of the GDP aggregate. GDP calculations disregard the fact that increasing production and consumption often wrecks the natural environment, which might eventually have serious consequences for future generations. And consumption does not equal happiness, so GDP tells us nothing about our quality of life.

Anyone who has familiarized himself with the theoretical background of GDP will recognize that these are not at all shortcomings in the GDP measure. GDP was never intended to be a measure of sustainability or quality of life, so any criticism of it in that vein is anachronistic. What’s actually being criticized in this context are the blatant misinterpretations of the GDP aggregate which have become all too common today. Political pundits have been happy to accept GDP as a universal measure of everything good just because it’s a simple number. As the authors write in Mismeasuring our lives;

“The early developers of GDP metrics were clearly far more aware of the assumptions that went into the construction of the index than many of those who have subsequently found the measure of such use.” (SSF 2010 p.xxvii)

In other words, although the theoretical justifications of the GDP measure can be criticized on other grounds, as I did in Things that grow, they certainly cannot be criticized for lack of attention to environmental and social problems. But the daily use of the GDP concept should absolutely be criticized when it is based on misinterpretation and overinterpretation. And this is what the authors of Mismeasuring our lives set out to do. This is a commendable undertaking, but I ask in this essay whether the alternative measures they propose are not equally liable to simplified misinterpretation in daily use.

Sustaining for the future

The first question I will deal with is the measurement of ”sustainability”, which obviously should be an important part of economic decision-making even though its far removed from the GDP aggregate. How can sustainability be assessed? The authors of Mismeasuring our lives include not only natural resources among the things which need to be sustained for future generations, but also social resources such as knowledge, education and research. They sensibly state that

“[The goal of converting] all the stocks of resources passed on to future generations into a common metric, be it monetary or not, (… ) seems overly ambitious.” (SSF 2010 p.98)

This is obviously right. I would without hesitation replace the words “overly ambitious” with the word “misguided” because this is so clearly a situation where diversity is not reducible to any simple measure. Even if we would restrict sustainability calculations just to natural resources, it would still be an arbitrary decision how they should be weighed against each other because we don’t really know how valuable each resource will be 10, 50, 500 years from now.

And this is the first major problems with sustainability indices. The authors of the report recognize that

“It could be argued that our descendants may become very sensitive to the relative scarcity of some environmental goods to which we pay little attention because they are still relatively abundant.” (SSF 2010 p.123-124)

There is therefore a tension between the applicability of sustainability indices in the short-term future and in the long-term. In the short-term scarcities and preferences can be guessed with reasonable confidence, so we know approximately what we should conserve, but such guesses are likely to become less and less accurate as time goes by.

This is a crucial difference between sustainability measures and the GDP measure. The GDP aggregate is always historical. It’s a summary account of economic activity during a past period in time. But we cannot assess the sustainability of our economy in the same historical manner because sustainability implicitly involves assumptions about the future.

There is also another problem with sustainability indices which should lead us to question their utility. Most sustainability problems are by definition international, not national. National indices are likely to be quite misleading if they do not take account of the fact that rich resource-importing countries contribute directly to the lack of sustainability in poor resource-exporting countries:

“This means that the actual sustainability of developed countries is overestimated, while that of the developing countries is underestimated.” (SSF 2010 p.124)

“This is one more argument in favor of an eclectic approach that mixes points of views. An approach centered on national sustainabilities may be relevant for some dimensions of sustainability, but not for others.” (SSF 2010 p.124)

It’s clear that sustainability must necessarily be understood as global sustainability. In other words, every nation cannot decide on its own what its particular needs will be in the future and thus construct its own notion of sustainability. The whole point of sustainability thinking is to preserve future resources for all humanity without selfish specifications. But even so, sustainability measures cannot be exclusively global because a country-by-country partition is essential if effective political measures are to be taken for sustaining future generations.

The problem is that items on the political agenda differ in their order of priority across countries. Many poor countries badly need the economic benefits of certain production activities which are unsustainable from a global perspective. Such priorities are constantly in flux in all countries as political and economic conditions change.

We therefore realize that a measure of sustainability which would be specific to each country, yet also amenable to global summation, would have to include normative assumptions about how responsibility is to be divided between countries. Indeed, such assumptions are a central part of international political bargaining, but they do not fit in well with the idea of a rigidly defined sustainability measure.

To conclude: there are two reasons why it is presumptuous to assume that any sustainability measure would remain valid for a longer period of time. The first is that the needs and preferences of future generations cannot be predicted. The second is that a meaningful sustainability measure would have to incorporate assumptions about a just division of responsibility – assumptions which are likely to become outdated sooner or later.

This does not by any means imply that analyses of sustainability are wrongheaded and must be abandoned. The implication is in fact exactly the opposite. Sustainability problems are too complex to be expressed in the form of a sustainability index. Questions of fairness and justice with regard to future generations and to economic inequality today are too important to be unchangeably fixed and hidden within an ”objective” number.

The important thing to remember is that aggregation is never an objective procedure. Specific presuppositions and assumptions go into any aggregation.

Surveying life

The other new proposal for statistical measurements which the authors of Mismeasuring our lives present is to measure quality of life. This proposal has its origins in the all too common misinterpretation of GDP as a happiness metric – a fallacy which should not require further comment at this point. But I think it’s important to note immediately that the authors are putting the cart before the horse. From the fact that GDP is misinterpreted in a certain way, they infer that there is a need for an aggregate statistic to measure something similar.

In other words, they assume that it would be a good thing to have a simple aggregate which measures “quality of life”, however it may be defined. This is another example of usage dictating definitions. This is particularly evident from the following passage.

“While assessing quality of life requires a plurality of indicators, there are strong demands to develop a single scalar measure.” (SSF 2010 p.95)

And if there’s ”demand”, then it must apparently be met. Never mind if the demanded measure can’t be sensibly defined. Once again I will express my skepticism of simplification, but we must of course take stock of the proposals before criticizing them. The authors divide quality of life measurements into three broad categories: subjective measures, objective measures and fair allocations (SSF 2010 p.62-63).

Subjective measures relate to individual opinions about the quality of life. Such opinions must of course be collected by surveying the population, asking them what they think and adding up the answers. Objective measures, on the other hand, relate to the capabilities of individuals: to their health, education, security, political influence and so on. And fair allocation is about equality, especially in objective capability.

It is obvious that one of the primary functions of government is to provide public services which ensure that all citizens can partake of capabilities for leading a good life.  Questions about fair allocation therefore belong in the daily work of practicing politicians and we should welcome new attempts to measure capabilities and inequality. By any account, the more precise the empirical measurements are, the more meaningful the debate and the more just the political action will be.

But as I’ve emphasized before in this essay, measuring is one thing and aggregation is another. It never comes down to a choice between having a quality of life aggregate or having no data at all – the question is how much the data should be simplified. The authors of Mismeasuring our lives clearly advocate a single measure which incorporates all aspects: subjective, objective and fair allocation.

Objective capabilities and inequality can perhaps be measured in a fairly straightforward manner, though they can certainly not be aggregated easily. Subjective measurements, on the other hand, are beset by a number of difficulties already at the measurement stage. Surveys which seek to measure subjective quality of life ask people about their feelings of happiness, pain, worry, pride and respect (SSF 2010 p.65) and thus produce data such as the U-index, “the proportion of one’s time in which the strongest reported feeling is a negative one” (SSF 2010 p.90).

Unfortunately, the scientific basis of such surveys is weak. One overbearing problem is that people’s interpretation of words like happiness, pride and respect is highly subjective, which invalidates the whole point of the survey. If one person conceives “respect” as obedience in corporate hierarchy, another as family honor and yet another as the right to walk down the street without hearing racist slurs, does it really tell us something general about society if two of them feel “respected” but one does not?

A second problem is that surveys consist of questions, and questions usually have nuanced presuppositions. Many examples can be given of surveys where two questions basically ask the same thing, yet yield totally different answers because they are phrased differently. For instance:

“In a survey on elderly people from Gijón (Asturias, Northern Spain) (…) a substantial majority of the respondents to the questionnaire (75%), asked to name three main problems in their lives, identified ‘financial problems’ as their first cause of concern. And yet, only 12.4% answered, in response to another question, that their main need was to have money. This lack of correlation between the answers to two questions seemingly close in semantic content, is astonishing.” (Diaz-Martinez 1997)

The lack of correlation probably results from a slight linguistic difference, which leads the respondents to associate one question with a negative presupposition and the other with a positive one (we would of course have to assess the original Spanish in the example above to hypothesize about that matter). However the case may be, reliable surveying is impossible if such small differences in the questions produce wide fluctuations in the response. These are fundamental reasons why measures like the “U-index” are suspect. Surveys, both their questions and answers, must be interpreted were carefully if any meaningful information is to be gained from them at all.

The particular problem that I would anticipate in surveys designed to provide data for an index value is that the questions would have to be exactly the same from year to year. And this means that they would have to be very abstract questions, such as “do you feel more respected now than you did a year ago?”. Abstract questions allow a multitude of interpretations, which in turn defeats the whole purpose of the survey.

It might be argued that it is better to have some survey results than to have none at all. At least the dissatisfaction of the people will be heard, and changes may tell politicians something even though the absolute values may be inaccurate. But I’m not inclined to accept this line of argument. I can see why people who make political decisions should seek out information on objective capabilities and fair allocation, but surveys of subjective opinions are just unnecessary. In a democratic society, subjective opinions are voiced (and sensibly aggregated) in elections. If a majority of people feel unhappy, worried or disrespected, they will dismiss those in charge and instate someone else whom they trust to meet their needs better.

A “happiness index” which includes surveys has such a weak scientific basis that misinterpretations and overinterpretations are certain to occur. Under the false guise of objectivity, an aggregate index easily becomes a rhetorical weapon. When a politician says “Under the present government, happiness has increased by 7.3%”, how can a critic respond?

Any aggregate which seeks to measure quality of life in general, collating both objective and subjective factors, will clearly be liable to misuse.

Concluding thoughts

The key question is: how much should information be simplified for political decision-making? If only one, two or three index numbers are used, the information is simple but distorted. If there are no aggregate numbers, only raw data, the information is accurate but far too complex for practical purposes.

Naturally there isn’t any general formula which would solve this dilemma. It must be remembered that politicians constantly interact with experts and aides of various kinds, so the interpretation of quantitative data is a complex social process which can unfold in many different ways. No general conclusions can be given on the optimal division of labor in that respect.

But GDP is a warning example of how statistical aggregates can be misused in political rhetoric, and that should give us something to think about. The authors of Mismeasuring our lives set out to search for alternative measures which could supplement GDP by taking social and environmental issues into consideration. Although they recognize the simplification vs. distortion dilemma, they nevertheless lean strongly towards simple general indices, a “dashboard” of aggregates as they call it.

In my view, the problem with such a dashboard is that statistics, and aggregate statistics in particular, require interpretation. They cannot be read and understood directly by their surface appearance, merely as “objective” numbers. If a set of aggregate statistics is provided for decision-makers, we put great trust in their capacity for valid statistical interpretation. They are more likely to go astray, to let their own agendas lead them to blatant misinterpretations, one stranger than the other. And since the aggregates are more or less artificial to begin with, decisions will eventually be based on misinterpreted mismeasures. We will then be left to conclude only that they don’t know what they’re doing.

We’ve seen in this essay that the value of general sustainability and quality of life aggregates can be questioned. A measure of sustainability may turn out to be misleading both in its material and normative assumptions. This may necessitate constant revisions which would inevitably dilute the value of the measure. A quality of life aggregate, on the other hand, would suffer from much the same afflictions as GDP. If unlike quantities are added together, the result won’t be meaningful.

The general problem with any aggregate number lies in its deceptive simplicity, which invites reification. If the aggregates proposed in Mismeasuring our lives would be adopted, abstract and multidimensional concepts like “sustainability” and “quality of life” would be treated as if they were simple material things, easily measurable along one dimension.

Simplification certainly facilitates discussion among people who don’t have the interest to study sustainability or quality of life in more detail.  But if the assumptions behind the aggregate become outmoded, or if they are just ignored and the aggregate is misinterpreted, the benefits that accrue from simplicity are soon lost. Then a rigid aggregate just hinders effective action for sustainability and quality of life because it becomes a mismeasure with a false appearance of objectivity.

It makes no great difference if the assumptions behind an aggregate number are publicized with utmost candor and specificity when the measure is first launched. Experience shows that the subtleties of precise interpretation are soon abandoned when the needs of political rhetoric call for an effective and simple message. The history of GDP misinterpretation shows us that an aggregate number can start to live a life of its own in political rhetoric without much regard for what it actually measures. We run the same risk with sustainability and quality of life indices. If they were to become an everyday tool in national and international governance, as their proponents surely hope, they would lose validity and meaning as time goes by.

It might be asked why these aggregates could not just be updated according to need. This might be possible to some extent, but again GDP is a warning example. The point of having an aggregate measure is that it should be comparable across time, so large changes are out of the question. National accountants are careful to change their procedures very gradually in order not to “distort” time series comparisons. But when usage dictates definition in this manner, the aggregate can’t possibly track changes in the real world with any accuracy. Adjustments are conducted mostly on grounds of political expedience, not new information.

There is of course no golden mean to be discovered between simplification and accuracy. Political decisions will always be based on information which is produced by experts, analyzed by bureaucrats and political aides, and debated in light of circumstances which are specific to each decision. The risk of misinterpretation increases as we move in that order from research to practical analysis to implementation and rhetoric.

But I for one am distrustful of supposedly univocal aggregates such as GDP or the general sustainability and quality of life indices proposed in Mismeasuring our lives. Occam’s razor, the maxim of prioritizing simplicity, is not a proper guide for political life. Information must be simplified somehow but that task should not be shouldered only by statistical experts. It is absolutely essential, since usage influences definitions, that the middlemen and decision-makers also do their part in keeping information correctly interpreted from one end of the line to the other.

If that is not done and if the true complexity of political questions is disregarded by the people in power, then the prospects of truly well-informed opinions are reduced. And in the end, the seeds we sow by relying on extremely simplified information will be reaped in the form of a badly governed and ignorant society.


(SSF 2010) Stiglitz, Sen and Fitoussi, 2010: Mismeasuring our lives – Why GDP doesn’t add up, The New Press.

longer version of SSF 2010 (not cited in this essay): http://www.stiglitz-sen-fitoussi.fr/documents/rapport_anglais.pdf

Diaz-Martinez and Navarro, 1997: Meta-analysis of surveys from a qualitative perspective, http://www.netcom.es/pnavarro/Publicaciones/Meta-analysisSurveys.html