Abundance of Choice

Complex statistical analysis and mathematical modelling involve multitudes of choices and assumptions. Recent “many analysts, one data set” studies show the danger of relying solely on one research team.

Here we present several examples from this literature.

We also present an example from climate modelling in which variations in modelling choices account for a greater share of variance than variations in scenario choice.

Silberzahn et al. (2018) — social science, psychology

Twenty-nine teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. …Twenty teams (69%) found a statistically significant positive effect, and 9 teams (31%) did not observe a significant relationship. Overall, the 29 different analyses used 21 unique combinations of covariates. …significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions.

Covariates included by each team
Table: Covariates included by each team.

Odds Ratios across 29 teams
Figure: Odds ratios across 29 teams.

OR point estimates clustered by analytic approach
Figure: OR point estimates clustered by analytic approach.

The observed results from analyzing a complex data set can be highly contingent on justifiable, but subjective, analytic decisions. Uncertainty in interpreting research results is therefore not just a function of statistical power or the use of questionable research practices; it is also a function of the many reasonable decisions that researchers must make in order to conduct the research.

Huntington-Klein et al. (2021) — economics

These researchers ask whether two published empirical studies reporting causal empirical results replicate when this is attempted by multiple research teams.

  1. Black et al. (2008) Staying in the classroom and out of the maternity ward? The effect of compulsory schooling laws on teenage births. The Economic Journal, 118(530): 1025–1054. Link
  2. Fairlie et al. (2011) Is employer-based health insurance a barrier to entrepreneurship? Journal of Health Economics, 30(1): 146–162. Link

They recruit 49 published researchers to participate in replication teams.

After attrition (due to a short completion window), they obtained 7 independent replications of each study.

Compulsory education
Figure: Results from compulsory education (#1.) replication study.

Health insurance
Figure: Results from health insurance (#2.) replication study.

Researchers make hundreds of decisions about data collection, preparation, and analysis in their research. …We find large differences in data preparation and analysis decisions, many of which would not likely be reported in a publication. No two replicators reported the same sample size. Statistical significance varied across replications, and for one of the studies the effect’s sign varied as well. The standard deviation of estimates across replications was 3–4 times the mean reported standard error.

Breznau et al. (2022) — economics, statistics

These authors pose the question: does immigration reduce support for social policies among the public?

To answer it, they recruited 162 participants across 73 teams.

Each team was provided with a database of answers to a 6-question module on the role of government in providing different social services, which is part of the long-running International Social Survey Programme (ISSP). They were also provided with a wide range of World Bank, OECD, and immigration data.

Variation in AME
Figure: Variation in Average Marginal Effect (AME) across 73 teams testing the same hypothesis with the same data. AME, point estimate, and Confidence Interval for each team.

…major research steps explain at most 2.6% of total variance in effect sizes and 10% of the deviance in subjective conclusions. Expertise, prior beliefs and attitudes explain even less. Each generated model was unique, which points to a vast universe of research design variability normally hidden from view in the presentation, consumption, and perhaps even creation of scientific results.

Sognnaes et al. (2021) — emissions, integrated assessment modelling

The authors develop explicit post-2030 projections of CO2 mitigation efforts. They employ two different formulations to generate emissions-mitigation scenarios: (i) continuing rates of emissions-intensity reduction, i.e. emissions per unit of GDP, and (ii) scaling of carbon prices in proportion to per capita GDP growth.

Whereas in many studies and applications, scenario pathways are identified through ‘backcasting’ from future climate targets, Sognnaes et al. (2021) employ two formulations of near-term mitigation efforts — current policies (CPs) and nationally determined contributions (NDCs) — to which they apply the above-mentioned (i) and (ii) long-term emissions-mitigation extensions beyond 2030. This results in a 2x2 matrix of combinations.

They then simulate forward emissions pathways using a diverse set of seven Integrated Assessment Models (IAMs):

The results of these simulations are summarised in the following two figures.

Global energy CO2 emissions; based on current policy
Figure: Global energy CO2 emissions to 2050 in CP scenarios. Shaded areas show emissions spanned by CP_Price and CP_Intensity scenarios for each model, and coloured bars show 2050 ranges (2045 value for FortyTwo, which has only intensity scenarios). Markers above bars show baseline values in 2050 (in 2045 for FortyTwo).
Global energy CO2 emissions; based on nationally defined contributions
Figure: Global energy CO2 emissions to 2050 in NDC scenarios. Shaded areas show emissions spanned by NDC_Price and NDC_Intensity scenarios for each model, and coloured bars show 2050 ranges (2045 value for FortyTwo, which has only intensity scenarios). Markers above bars show baseline values in 2050 (in 2045 for FortyTwo).

The authors’ conclusions parallel those of the ‘many analysts, one dataset’ studies:

the model used has a larger impact on results than the method used to extend mitigation effort forward, which in turn has a larger impact on results than whether CPs or NDCs are assumed in 2030. The answer to where emissions are headed … might therefore depend more on the choice of models used and the post-2030 assumptions than on the 2030 target assumed. This renders estimates of temperature consequences of NDCs and CPs sensitive to study design and highlights the importance of using a diversity of models and extension methods to capture this uncertainty.


These studies have explicit selection criteria to ensure that they recruit only competent researchers.

Prior to studies of this nature, the “degrees of freedom” inherent in empirical analysis were not fully appreciated by researchers.

More than abundance of choice, these studies reveal a vast universe of previously hidden research-design and analysis-operationalisation variability.

Kim Kaivanto
Kim Kaivanto
Senior Lecturer in Economics

economics and finance, normative and behavioural, academic and applied