The Death of Evidence

This month that just passed marked my 25th anniversary of making a
living in the social sciences. I got my first research job in Rural
Sociology in April 1980 while I was still an undergraduate. I had
the pleasure of being trained by a fairly close-knit faculty, many
of whom were fairly late in their careers. They happened to
remember a time that is likely to be oblivious to most who work in
the social sciences now, and held a number of highly useful views
that have lost favor today and are remembered as little more than
quaint. For better or worse, many of those views have stuck with me.

They used to talk about the time when it would take them a couple of
weeks of intensive labor to generate the results for a relatively
straightforward multiple regression equation. If you think about
this, it makes sense; if all you’ve got are hard copy grids of
data, a statistics book with all the necessary formulas and
distribution tables, a pad and a pencil, you can imagine that it
would take forever to calculate a multiple regression equation that
meets the simplest publication standards. Add the fact that they
had little to no computing power at their disposal (aside from
likely being late adopters, who wanted to devote the precious few
computing resources available in those days to social scientists?)
and had extensive teaching and committee responsibilities, and you
can see why doing quantitative social science in those days was such
a risky business.

They did it anyway, but with an outlook quite a bit different than
anything we see today. Given that results were so hard-earned, they
needed to actively engage in a number of activities that would
increase the probability of their results being fruitful. So they
made sure that the theories they were anxious to test were highly
developed conceptually before using any statistics on them. They
made sure their data met the basic standards for normal distribution
and agonized over how to most effectively operationalize any and all
of their key concepts. In other words, they applied the standards
for statistical testing in the way they were originally intended by
the theorists who developed them. Anything else would have likely
led to disaster.

This all began to change in 1975 (I believe) with the introduction
of SPSS. While SAS may have come out sooner and been more
statistically rigorous, it never shared the goal of being incredibly
easy to use. SPSS was designed by social scientists with an eye
toward making a wealth of statistical tools available to anyone with
the simplest skills and low-level access to a mainframe computer.
Suddenly, results that had previously taken weeks to generate could
now be done in minutes. In addition, there was now a much greater
variety of statistical tricks and tools that could be employed. And
perhaps most important of all, all these statistics could be easily
generated by a program that required absolutely no knowledge of
statistics to be run successfully. Most of my statistical training
in school was motivated by trying to understand all the various
techniques that I was routinely generating on the computer.

The dramatic reduction in the cost of generating statistical results
completely changed the face of how social science was performed.
Gone were the days of assuring adequate conceptual development and
that data met basic distributional requirements; – the computer could
easily generate the results no matter how flimsy or questionable the
data. This would ultimately give birth to the age of keeping busy
doing social science without thinking. In many circles, theoretical
development would be seen as little more than corroborated
empiricism. Finally, the advent of cheap statistical results
dramatically increased the number of iterations performed to
complete the statistical testing for any one project. Rather than
carefully laying out the concepts and the rigorous data prior to
calculating a relatively small number of statistical tests,
practitioners would take half-baked concepts and marginal data and
massage them through a wide variety of transformations and
statistical techniques before any final results were generated and
presented. Keeping track of all the iterations performed and the
rationale for each of them became a separate project in and of
itself.

Though the logic of generating results became more convoluted, the
time and space to present those results to the scientific community
either remained constant or shrank. It was largely impossible for
practitioners to fit the rationale for all their statistical
gyrations into the same 20-page journal article or 20-minute
conference presentation that needed to contain the substantive
argument, literature review, results, conclusions and implications
of the project. For a while, practitioners appeared to live by an
ethic that said that while one might not be able to present all the
statistical gyrations and the rationale for them in a conference
presentation or journal article, they would still prepare at least
an informal supplement containing this information so that if anyone
asked for it, it would be completely available in all its splendor
and glory.

Of course, no one ever asked for this information. People were too
busy and roughly overwhelmed with information overload to want to be
bothered with the petty details of how a conclusion was reached, so
inquiries into the nuts and bolts of the statistical methods were
never asked. As a result, it didn’t take long for this ethic to be
abandoned. It got lost for two reasons. First, it atrophied from
the sheer lack of use; – why make the effort if no one subjects the
work to the scrutiny? Second, continued advancements in computing
power lead to further increases in the number of statistical
gyrations that many practitioners would take; – so much so that the
authors themselves could not keep track of all the choices and
modifications they had performed to generate their final results.

This has all been compounded by the bureaucratization of the
production of knowledge. Many practitioners now organize their
research efforts like a business. The principal authors are
frequently the CEOs of the research projects, concerning themselves
only with the high-level concepts and “strategic direction” of the
work. The details of the production of the statistical results are
frequently left to the graduate students, whose efforts are
frequently not scrutinized. A favorite example comes from the tail
end of my graduate career. I was doing some work for a department
who had a member who had recently completed their dissertation. In
the dissertation, they had reported their statistical results using
linear structural modeling and a computer program entitled LISREL,
which was the big macho statistical technique of the time. During
staff meetings and other consultations, members of this department
sought this individual’s advice on issues related to this technique,
because they were now seen as an authority on it. The truth was
that this person could barely explain the difference between a mean
and a median. They had paid one of my analyst colleagues to do the
LISREL analysis for the dissertation for them and explain what it
all meant. So they were viewed as an authority on something about
which they knew very little. This method of organization is
probably fine for making widgets or creating a decent restaurant,
but it seems a little troubling for generating knowledge.

The end result of all these trends has been a growing distance
between what constitutes knowledge and the means used to produce
that knowledge. This is true not only of consumers of knowledge,
but also for producers. If you go to a conference today and ask a
presenter about the measures they used to construct an index, or
about the Cronbach’s Alpha for a given scale, they’ll look at you
like you’re a psychopath, and then proceed to tell you that
everything they’ve done is reliable because it’s based on something
someone else did at least once. You know that the logic and/or
methodology of their analysis is likely flawed, but there is no way
to get them to go to the level of detail necessary to address the
concerns. Important levels of scrutiny have become little more than
quaint artifacts in today’s fast-paced world.

I don’t suspect we reached this particularly sad state of affairs
through any sort of maliciousness. Certainly when the problem
began, it was merely a matter of the research enterprise taking more
turns and generating more information than can adequately be covered
in a small amount of space or time. As the problem matured a bit, I
suspect that some researchers were willing to hide their flaws in
thinking (or, perhaps more likely, lack of thinking) in the sheer
volume of activity in which they engaged to produce their results.
But now, I’m beginning to wonder if we’ve reached the point where
some people are willing to exploit the space and time limitations
that constrain well-reasoned arguments to forward points of view
without having to consider the possibility of contrary evidence.
All you need to do is select a few points of evidence (and withhold
the troubling ones), set your own context, strategically refute one
or two bits of counterevidence without putting that counterevidence
in context, and look real authoritative without ever attending to
the basic rules of evidence. It may be a bit intellectually
dishonest, but it’s still effective.

I’m wondering if Malcolm Gladwell might epitomize this problem.

It’s definitely not fair for me to make this judgment yet, because
I’ve only read a few pages of The Tipping Point. I just opened the
book to a random space and read about twenty pages while waiting to
get into the shower on Friday morning. Lack of context
notwithstanding, I was astounded by what I read. He seemed to be
basing his argument on what constituted “Mavens” and “Salesmen”
solely on a couple of interesting people he had met. I sensed no
attempt to explain how the traits of these particular people might
be generalized to others in anything but the most speculative ways.
There was no attempt to delineate the breadth of their influence, or
of those like them. He quotes a study that claims that folks who
watched ABC News around the 1984 presidential election were more
likely to vote for Reagan because the anchorperson smiled more often
when talking about Reagan than about Mondale. It’s a provocative
claim, but the evidence cited to support it was so selective that
the argument bordered on fraudulent. He makes a claim that ABC
otherwise covered stories in a way that was more hostile to Reagan
than the other networks, but does not present the source of that
claim or the evidence for it. I’ve heard the opposite on numerous
occasions, but have no grounds from this author on how to judge
these competing claims. He presents no evidence about the
demographic of the audiences that tune into these broadcasts that is
independent of the study he wants you to believe. There is no
direct evidence about the content of the news stories they run. It
was nothing but a bunch of provocative (and intuitively satisfying)
claims with no solid evidence to support or refute them. Yes, the
claims he makes are fun to think about, but there appears to be
little of substance behind them.

This is not the only place I’ve seen these tendencies. There’s a
guy at Johns Hopkins who’s been hawking an organizational “culture
of safety” questionnaire that has many of the folks I work for all
excited. Every time I come in contact with him or his work, I feel
like I need a shower. There may be something to his findings, but
it’s awfully hard to tell. He’s very selective about the evidence
he presents and shows no interest in explaining everything he shows
to you. He makes no attempt to address the existence of contrary
evidence or test for what might be some fundamental biases in his
work. He’s slick and demur, but I can’t help but think that he is
intellectually dishonest. Perhaps I’m not being fair, but I’m just
not used to feeling quite this level of skepticism when confronted
with academic work.

I don’t know what’s worse; the fact that we have people producing
this type of work, or the ease with which so many of us are ready to
consume it. Are we so easily seduced by provocative hypotheses and
the flights of fancy we’re inclined to take around them that we are
blind to the need for decent evidence to back them up?

The political environment we’re currently living in is not helping
matters any, given the current administration’s flagrant disdain for
facts. But it seems strange to see this level of discourse coming
from intellectuals. While I can see how we may have come to this
point, that knowledge brings me no comfort. How do we go about
reducing this level of intellectual degeneration?

Someone could say that I’m guilty of using Gladwell’s tactics to
make the case for criticizing him. But all I’m doing is getting the
ideas down before I lose them. Before I ever did anything to make
them public, I would make a concerted effort to examine – …let’s see…
what should we call it? How about evidence?

And I’ll start by actually reading the book from the beginning. I
was curious to see if any of my initial thoughts resonated for you
given that you’ve read it. I’ll keep you posted as I delve into it
more….

2 thoughts on “The Death of Evidence

  1. We had an interesting class discussion just a couple of days ago with my doctoral students about the same issues (What are the reasons for the seeming lack of valid and relevant quantitative research; how did we come to the current state of publishing lots of poorly designed and insignificant studies in reputable journals, etc.). Your essay seems to be a direct answer to these questions, and does a fabulous job providing a clear and logical explanation of the main reasons.

    Interestingly, I had received a week ago a letter (actually, a whole package with numerous articles and exhibits) from a Japanese professor (who after some online investigation turned out to be a big shot in APA and international psychology research circles). He was commenting on one of my articles and offered suggestions. Aside from many other intriguing things, found in his letter, there was something related to statistical analysis that caught my attention, and I wanted to hear your reaction. He is challenging the assumption that the use of statistics based on the assumption of the normal distribution is relevant in social sciences, since unlike hard sciences, observations in the social science realm are rarely (if ever) completely random and independent. He bases his argument on the fact that all social events are likely to be part of a complex web of interrelations and morphogenetic causal loops (in addition to being a statistician and psychologist, he was active in the cybernetics movement in the 1960s and 70s). Therefore, he asserts that researchers using statistics based on the normal distribution assumption continue doing this because it has become too sacred an assumption to even try to refute it, and because it is convenient to assume normal distribution and brush aside any evidence to the contrary by labeling it as anomalies, outliers, etc. What is your reaction?

    Finally, Gladwell. I would like to see what your reaction will be when you finish the book, but I certainly agree with what you say so far. It is completely amazing that not only him, but the majority of bestselling authors out there take “evidence” that was not rigorously tested, is based on anecdotal accounts, or on a couple of observations, and create coherent, emotionally appealing accounts, which are then turned into rather coherently sounding “theories” and become very influential. (By the way, this is really an interesting phenomenon to investigate: How articles or books that were not based on any research evidence result in creation of a strong “theory”that everybody starts using in follow-up research and work; One example is the Fishbein and Ajzen’s model in psychology; they have developed it as a proposed model of potential relationships, and never had time to test it. Later people started siting it as justification for their own models, conclusions, etc., using references as “As F &A have found…”. Of course, F &A have never found anything and never even claimed to have found anything). I was thinking about this not only after I have finished reading Gladwell, but also during my weekend class. One of our faculty members is a very successful consultant, who publishes books which are even less well researched and not as well written as Gladwell’s, but still attract a lot of attention and help him build strong consulting reputation. And it is not only the books. Listening to everything he was saying in class, his use of other popular (but not based on any real evidence) models both in class and in consulting has left me completely baffled. And even more baffling was the students’ reaction (complete, unquestioning acceptance of this as normal). And we are talking not about inexperienced undergrads. Our students have on average 15-20 years of professional experience, many are former or current CEOs, VPs, high-level managers…

    Your comment at the end of your posting (that one could respond that you are committing the same mistake as Gladwell does, using his method of basing your argument on just a couple of examples) made me think about something else that troubles me. On the one hand, I see the problem with using all these unfounded “theories” as the ground for consulting and teaching. On the other, I have a feeling that I, personally, could become completely paralyzed and will not be able to teach (let alone do any consulting work) if I stop using any of these, and will be trying to either find or develop my own seriously researched evidence. What I can do so far is to help my students to be better consumers of research and consulting models and writing, by alerting them to dangers of taking things
    at face value and to the need to do their own, independent thinking. But there are two dangers in this: a) I could contribute to the development of a new cohort of people who do not trust anybody and anything, but are not able to do anything productive themselves; b) I will never be able to develop anything of consequence myself, since where I am at the present time it is impossible to do any long-term and deep research (no time, no incentives). Any thoughts?

  2. I’ll need a little bit more time to think about some of the questions that you’ve raised in your response, but for the moment I figured I’d give a few quick reactions…

    I can’t say I have much of an issue with your Japanese colleague’s assertion about the use of statistics in the social sciences that require a normal distribution. He’s right that almost none of the data gathered in social research are normally distributed. The implication, then, is that people should not use statistical techniques whose validity is based on normally distributed data. I’ve got no problem with that, but this then eliminates almost all the techniques that people most commonly know and use. I don’t think the problem is that people think the normal distribution assumption is sacred – it’s that they don’t know what else they can use that doesn’t require it.

    My solution to this problem has always been to simplify the way I ask questions of the data so that I don’t rely on techniques that require a normal distribution. This is why I always try to exhaust what I can learn from crosstabulations before considering anything more complex. Now you know why Dick was always so frustrated when he tried to get regression equations out of me. Another common solution for many has been to “normalize” the distribution of given variables by doing constant transformations on them (like using the natural logarithm of each data point). These will indeed normalize the distribution, but usually at the cost of minimizing their natural variance. I’ve never particularly cared for this technique – though many claim there are legitimate statistical justifications for it, it’s always felt a bit dishonest to me. It’s always felt substantively shady to me, even if there are reasons to justify it statistically.

    The other option, of course, is to use other statistical techniques that do not require a normal distribution. I’m not up-to-date enough in my statistical knowledge to know what these are, and I suspect many others are in the same boat. There may be an opportunity for real growth here, assuming that these techniques exist and don’t merely shift us from one series of traps into another. I’d be curious to know how your colleague thinks this problem could be solved…

    The fallacies that others use to forward claims without adequate evidence is absolutely no reason to become timid about how you might forward claims or hypotheses. All that’s required is keeping in mind the fairly simple rules of evidence. The main one, of course, is that every assertion should be designed to be refuted, and it never hurts to do anything and everything you can to try to refute it before you present it. If you can’t refute it, present the framework within which someone else can. Any good study does its best to be upfront about its limitations, and invites refutation from all angles. This should simply be part of the thought process, and doesn’t require tons of additional work.

    There are basic techniques in developing claims that easily safeguard us from demagoguery that are second nature in most good critical thinkers. One of the biggest fallacies I see in Gladwell and others who suffer from this problem is the lack of consideration of other variables independent of the ones you’ve chosen that might also effect the phenomenon in question. It’s the fallacy of the single cause, and it’s incredible to me that we now have cadres of people who seem able to get away with this. The classic example that damned near everyone gets early in their research training is the story of the Reader’s Digest poll about the 1936 presidential election. Reader’s Digest did a telephone poll of their readers to see who they preferred in the race between Wendell Wilkie and Franklin Roosevelt. The results of the poll were heavily in favor of Wilkie, so the editors were left scratching their heads when Roosevelt won so handily. The problem, of course, was that only affluent people had telephones in 1936, so their sample was far from representative of the voting public. They let themselves get sidetracked by the single cause to the point that they did not consider other obvious factors. This is the type of scrutiny that was missing from Gladwell’s assessment of the Reagan/Mondale news-watching data.

    If you’re thinking this seems awfully basic, you’re absolutely right. That’s what’s so appalling about the quality of much of this discourse. Even though the evidence for many of these claims are flimsy, this doesn’t mean that the assertions are not useful and provocative. I wouldn’t hesitate to use them – I’d just trust your natural instincts of critical thought to place them in their proper context. No need to be timid – you have all the skills you need to resist the difficulties without creating a lot of undue work for yourself.

    Use of evidence may be becoming a lost art, but it is neither difficult nor inaccessible. Many of us (including you) do it as second nature. No need to despair. More later…

Leave a Reply

Your email address will not be published.