“[C]onfusion about the foundations of the subject is responsible, in my opinion, for much of the misuse of the statistics that one meets in fields of application such as medicine, psychology, sociology, economics, and so forth.” (George Barnard 1985, p. 2)
“Relevant clarifications of the nature and roles of statistical evidence in scientific research may well be achieved by bringing to bear in systematic concert the scholarly methods of statisticians, philosophers and historians of science, and substantive scientists…” (Allan Birnbaum 1972, p. 861).
“In the training program for PhD students, the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science must be covered.” (p. 57, Committee Investigating fraudulent research practices of social psychologist Diederik Stapel)
I was the lone philosophical observer at a special meeting convened by the American Statistical Association (ASA) in 2015 to construct a non-technical document to guide users of statistical significance tests–one of the most common methods used to distinguish genuine effects from chance variability across a landscape of social, physical and biological sciences.
It was, by the ASA Director’s own description, “historical”, but it was also highly philosophical, and its ramifications are only now being discussed and debated. Today, introspection on statistical methods is rather common due to the “statistical crisis in science”. What is it? In a nutshell: high powered computer methods make it easy to arrive at impressive-looking ‘findings’ that too often disappear when others try to replicate them when hypotheses and data analysis protocols are required to be fixed in advance.
How should scientific integrity be restored? Experts do not agree and the disagreement is intertwined with fundamental disagreements regarding the nature, interpretation, and justification of methods and models used to learn from incomplete and uncertain data. Today’s reformers, fraudbusters, and replication researchers increasingly call for more self-critical scrutiny on philosophical foundations. Philosophers should take this seriously. While philosophers of science are interested in helping to clarify, if not also to resolve, matters of evidence and inference, they are rarely consulted in practice for this end. The assumptions behind today’s competing evidence reforms–issues of what I will call evidence-policy–are largely hidden to those outside the loop of the philosophical foundations of statistics and data analysis, or Phil Stat. This is a crucial obstacle to scrutinizing the consequences to science policy, clinical trials, personalized medicine, and across a wide landscape of Big Data modeling.
Statistics has a fascinating and colorful history of philosophical debate, marked by unusual heights of passion, personality, and controversy for at least a century. Wars between frequentists and Bayesians have been so contentious that everyone wants to believe we are long past them: we now have unifications and reconciliations, and practitioners only care about what works. The truth is that both brand new and long-standing battles simmer below the surface in questions about scientific trustworthiness. They show up unannounced in the current problems of scientific integrity, questionable research practices, and in the swirl of methodological reforms and guidelines that spin their way down from journals and reports, the ASA Statement being just one. There isn’t even agreement as to what is to be meant by the method “works”. These are key themes in my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP).
Many of the key problems in today’s evidence-policy disputes inherit the conceptual confusions of the underlying methods for evidence and inference. They are intertwined with philosophical terms that often remain vague, such as inference, reliability, testing, rationality, explanation, induction, confirmation, and falsification. This hampers communication among various stakeholders, making it difficult to even recognize and articulate where they agree. The philosopher’s penchant for laying bare presuppositions of claims and arguments would let us cut through the unclarity that blocked the experts at the ASA meeting from clearly pinpointing where and why they agree or disagree. (As a mere “observer”, I rarely intervened.) We should put philosophy to work on the popular memes: “All models are false”, “Everything is equally subjective and objective”, “P-values exaggerate evidence”, and “ most published research findings are false”.
So am I calling on my fellow philosophers (at least some of them) to learn formal statistics?That would be both too much and too little. Too much because it would be impractical; too little because despite technical sophistication, basic concepts of statistical testing and inference are more unsettled than ever. Debates about P-values–whether to redefine them, lower them, or ban them altogether–are all the subject of heated discussion and journalistic debates. Megateams of seventy or more authors array themselves on either side of the debate (e.g., Benjamin 2017, Lakens 2018), including some philosophers (I was a co-author in Lakens, arguing that redefining significance would not help with the problem of replication). The deepest problems underlying the replication crisis go beyond formal statistics–into measurement, experimental design, communication of uncertainty. Yet these rarely occupy center stage in all the brouhaha. By focusing just on the formal statistical issues, the debates give short shrift to the need to tie formal methods to substantive inferences, to a general account of collecting and learning from data, and to entirely non-statistical types of inference. The goal becomes: who can claim to offer the highest proportion of “true” effects among those outputted by a formal method?
You might say my project is only relevant for philosophers of science, logic, formal epistemology and the like. While they are the obvious suspects, it goes further. Despite the burgeoning of discussions of ethics in research and in data science, the work is generally done by practitioners apart from philosophy, or by philosophers apart from the nitty-gritty details of the data sciences themselves. Without grasping the basic statistics, informed by understanding contrasting views of the nature and goals of using probability in learning, it’s impossible to see where the formal issues leave off and informal, value-laden issues arise or intersect. Philosophers in research ethics can wind up building arguments that forfeit a stronger stance that a critical assessment of the methods would afford (e.g., arguing for a precautionary stance, when there is evidence of genuine risk increase in the data, despite non-significant results.) Interest in experimental philosophy is another area that underscores the importance of a critical assessment of the statistical methods on which it is based. Formal methods, logic and probability are staples of philosophy, why not methods of inference based on probabilistic methods? That’s what statistics is.
Not only is PhilStat relevant to addressing some long-standing philosophical problems of evidence, inference and knowledge, it offers a superb avenue for philosophers to genuinely impact scientific practice and policy. Even a sufficient understanding of the inference methods together with a platform for raising questions about fallacies and pitfalls could be extremely effective. What is at stake is a critical standpoint that we may be in danger of losing. Without it, we forfeit the ability to communicate with, and hold accountable, the “experts,” the agencies, the quants, and all those data handlers increasingly exerting power over our lives. It goes beyond philosophical outreach–as important as that is–to becoming citizen scholars and citizen scientists.
I have been pondering how to overcome these obstacles, and am keen to engage fellow philosophers in the project. I am going to take one step toward exploring and meeting this goal, together with a colleague, Aris Spanos, in economics. We are running a two-week immersive seminar on PhilStat for philosophy faculty and post-docs who wish to acquire or strengthen their background in PhilStat as it relates to philosophical problems of evidence and inference, to today’s statistical crisis of replication, and to associated evidence-policy debates. The logistics are modeled on the NEH Summer Seminars for college faculty that I directed in 1999 (on Philosophy of Experiment: Induction, Reliability, and Error). The content reflects Mayo (2018), which is written as a series of Excursions and Tours in a “Philosophical Voyage” to illuminate statistical inference. Consider joining me. In the meantime, I would like to hear from philosophers interested or already involved in this arena. Do you have references to existing efforts in this direction? Please share them.
Barnard, G. (1985). A Coherent View of Statistical Inference, Statistics Technical Report Series. Department of Statistics & Actuarial Science, University of Waterloo, Canada.
Benjamin, D. et al (2017). Redefine Statistical Significance, Nature Human Behaviour 2, 6-10.
Birnbaum, A. (1972). More on concepts of statistical evidence. J. Amer. Statist. Assoc. 67 858–861. MR0365793
Lakens et al (2018). Justify Your Alpha Nature Human Behaviour 2, 168-71.
Levelt Committee, Noort Committee, Drenth Committee (2012). Flawed Science: The Fraudulent Research Practices of Social Psychologist Diederik Stapel (www.commissielevelt.nl/).
Mayo, D. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP). (The first chapter [Excursion 1 Tour I ] is here.)
Wasserstein & Lazar (2016). The ASA’s Statement on P-values: Context, Process and Purpose, (and supplemental materials), The American Statistician 70(2), 129–33.
Credit for the ‘statistical cruise ship’ artwork goes to Mickey Mayo of Mayo Studios, Inc.
Deborah G. Mayo
Deborah Mayo is Professor Emerita in the Department of Philosophy at Virginia Tech. She’s the author of Error and the Growth of Experimental Knowledge(1996, Chicago), which won the 1998 Lakatos Prize awarded to the most outstanding contribution to the philosophy of science during the previous six years. She co-edited, with Aris Spanos, Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science(2010, CUP), and co-edited, with Rochelle Hollander, Acceptable Evidence: Science and Values in Risk Management (1991, Oxford). Her most recent work is Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars(2018, CUP). Otherpublications are available here.
How about a philosophy of stats course for statisticians with a keen interest in philosophy of science applied to statistics?
Jake: We do see the Summer Seminar this way. I grant that a full blown course relating phil sci and stat sci–something I’ve taught several times–requires more, but we do intend this for stat practitioners as well. Please see the description on summerseminarphilstat.com
What you’re doing is really important! In addition to the deep philosophical issues you mention and which I imagine you will discuss at length in your summer seminar, I wonder if you might be interested in discussing elementary philsci concepts in a form that ordinary scientists might understand. If you’re at all interested in this, I’d be happy to talk further by email or other means.
By way of introduction, I am a retired computer scientist with a career split between mainstream CS and bioinformatics and orthogonally split between academia and industry. You can check me out on LinkedIn at http://www.linkedin.com/in/nathan-goodman-391b451a.
Nathan: Thank you! I certainly hope that these philsci concepts are discussed in the form that ordinary scientists can understand in my book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). I’d be glad to talk further.
Thanks for the reply. Your book is wonderful, but I was thinking of something more bite-sized. Perhaps a series of short blog posts covering basic topics. In my limited experience, philosophers like to “write long” to properly discuss the nuances, while scientists like to “read short” to get the basic gist.
I’d be happy to help with the writing. Of course, you’d have to provide the content. Please feel free to continue the conversation by email if that’s more convenient: natg at shore.net