In my study of the social sciences, statistics and statistical analyses come up frequently. These methods provide a way to make inferences based on observations. Most often in the social sciences, we want to observe a smaller group (a sample) in order to make inferences or predictions about a larger group (a population).
As you no doubt recall, the scientific method is all about making observations, forming a hypothesis, and then testing the hypothesis through some sort of experiment and/or further observations. Thus, while statistical analyses can be exploratory (when we just look to see what’s out there and what’s real), their most common function in social science is to test hypotheses.
That said, there are several ways we can approach statistical analysis. If you’ve taken a stats course before, you probably have been exposed to null hypothesis significance testing. This is the most common approach these days. When we talk about Bayesian statistical analyses, we usually refer to a fundamentally different approach.
The Usual Approach
The commonly accepted method for statistical analyses is based on what is called the frequentist approach to statistical analysis. The specific tool most often employed is null hypothesis significance testing (NHST). Much of introductory statistics courses are filled with trying to explain this tool. Essentially, it involves defining not only the hypothesis of interest, but also a null hypothesis. The null hypothesis says that there is no effect, or no relationship, or nothing of interest. NHST then seeks evidence against the null hypothesis. Because this is something of a backwards approach, it takes a significant amount of effort to explain it even to reasonably smart college freshmen.
The result of a NHST analysis (e.g. chi-square, t-test, and so on) is ultimately expressed in a p-value. If the p-value is sufficiently low then we say we can reject the null hypothesis. By convention in the social sciences, the test is considered “statistically significant” if p < 0.05, meaning there is less than a 5% probability of observing this data given the null hypothesis.
NHST, the common approach to statistical analyses in most scientific studies, asks, “what is the probability of observing this data, assuming that my hypothesis is false?” In other words, we find evidence of the alternative hypothesis through kind of a roundabout way: if it is unlikely that we would observe what we did under the assumption of the null hypothesis, then we have some confidence that an alternative hypothesis is true.
The Bayesian approach asks a more straightforward question: what is the probability of a particular hypothesis, given the observed data? Where NHST says “let’s assume we’re wrong and see what the numbers say,” Bayesian modeling says “let’s assume what we’ve observed is true and see what that tells us.”
Bayesian Data Analysis
Bayes’ Theorem defines the relationship between the probability of one event based on knowledge of another event. It says that the probability of event A given that event B has occurred is the same as the probability of B given A, multiplied by the probability of A and divided by the probability of B. This may sound like an obscure and quirky thing to describe just in terms of A and B, but the relationship allows us to make a statistical analysis much more intuitive in the context of proving a hypothesis through observations.
In order to answer a question phrased, “what is the probability of a particular hypothesis, given the observed data,” Bayes’ Theorem becomes much more practical than its theoretical description. It gives us a direct way to calculate the probability of something (our hypothesis) given what we’ve observed (the data). In its implementation, though, we do not test the probability of a precise, single hypothesis. Instead, we develop a probability distribution across all plausible hypotheses from which we can draw samples to ask questions about any plausible hypothesis. This makes a Bayesian approach arguably much more powerful.
Why Would Anyone Use NHST?
If Bayesian modeling is so much more straightforward and powerful, why would anyone prefer to use NHST? Well, there are a few reasons.
NHST provides a concrete test.
Even though the question posed by NHST seems a bit backwards, it is a clear, concrete question. In many cases (specifically those in social sciences) it is the most concrete way to ask a question.
As an example, consider a question like the ones common in social science. Suppose we want to understand if there is an association between level of education and income. While this is nominally a yes-or-no question (“Does education affect income?”), the quantitative answer is not nearly as simple. Education could have a positive effect on income, or a negative effect; it could have a small effect or a large effect. Or, of course, it could have no effect at all. NHST focuses on the last possibility—the null hypothesis, that education has no effect on income—because that is the only possibility that can be clearly defined in a numeric way. In this example, the null hypothesis says that the association between education and income is zero.
When we focus on the null hypothesis, the question clearly becomes a yes-or-no proposition. If we ask the more open-ended question, “what is the relationship between education and income?” we find that there are an infinite range of possibilities. It could be that, for example, every year of education adds $10,000 to annual income. Or maybe it’s $5,000. Or some other value. Or maybe the value is different for the first few years of education than it is for education beyond high school. Maybe the relationship starts out positive, but after 6 years of college it levels off and reverses. NHST ignores all these complexities and just focuses on no-relationship versus some-relationship.
While NHST by itself cannot tell us how strong the association is between these two variables, nor can it describe how large an effect is, it does one thing very well. It says, in a straightforward way, whether or not an association exists.
NHST is mathematically simple.
Many students who are asked to perform statistical calculations will probably disagree with me on this one. But the tools of NHST—t-tests, ANOVA, OLS regression, etc.—have mathematical formulae that will give an answer in a relatively straightforward manner. It might take quite a few minutes to add up all the numbers or to do whatever other operations are required of the formula, but there is nothing fundamentally hard about performing the calculations. With a computerized tool like SPSS or a spreadsheet, the calculations can be done relatively quickly.
Bayesian analyses, on the other hand, require much more involved mathematical algorithms. One of the reasons that these methods have not been used more often in the past is that they are just too complicated to be practical. Only with the availability of cheap, fast computers have Bayesian models become a practical set of tools to analyze data. Additionally, new computational methods of estimation such as Markov Chain Monte Carlo (MCMC) have turned really really hard Bayesian computations into just moderately hard computations. You’ll still want a computer to help, but at least the computations can be done in the time before your next term paper is due.
NHST provides a definitive and reproducible result.
Although scientists will disagree on how to interpret results, and a social experiment run over again may yield different findings, any statistician with the same data can calculate the same NHST statistical model and get the same p-value.
This is not always true with Bayesian models. Bayesian algorithms sometimes rely on random sampling (i.e. MCMC). Because of this, different statisticians, even with the same data and the same question, may get slightly different results. A well-formed model, though, should always yield the same conclusions even if the numbers end up being slightly different.
What Are the Benefits of Going Bayesian?
While NHST has its merits, and it is a useful technique under certain circumstances, a Bayesian approach can do things that NHST cannot.
Bayesian can be more intuitive.
As defined above, NHST answers the question, “what is the probability of the data given the null hypothesis?” Contrastingly, the Bayesian approach answers the question, “what is the probability of a hypothesis given the data?” Because the data is observed, it is more intuitive to assume that the data is true (after all, it actually is!) instead of assuming the hypothesis is false. This assumption matches the reality of what is known within a scientific test. Thus, the inferences via a Bayesian approach are easier to follow.
Social science, it is said, is currently experiencing a replication crisis. While there are various reasons for this, much of the literature on the crisis points to scientists—even highly respectable statisticians—misunderstanding statistical inferences via NHST. What does it mean to reject the null hypothesis? Even worse, what does it mean to fail to reject the null hypothesis? What can be inferred from p-values marginally above or below the conventional cutoff of 0.05? Is 0.05 even an appropriate cutoff in a particular case? Many scientists do not fully appreciate the nuance in the details, which are perhaps more complicated than they ought to be because of the conventions and limitations of the techniques.
Bayesian methods, on the other hand, do not suffer from these complexities in interpretation. Neither p-values nor null hypotheses apply to Bayesian statistics. Nothing is rejected. Instead, Bayesian inferences focus on identifying what predictions are most probable.
Bayesian provides a deeper understanding.
While NHST answers a simple question, a Bayesian model can answer many questions. Beyond a single p-value, a Bayesian model can provide the researcher with a far more complete understanding of statistical inferences. While researchers using a frequentist approach sometimes estimate multiple models or perform a full array of statistical tests to gain a deeper understanding, a Bayesian model can do this in one go.
Bayesian doesn’t require Gaussian.
For technical reasons outside the scope of this essay, most models used for NHST have built-in assumptions that the data have a Gaussian (normal) distribution. While there are ways to smooth over data that do not fit the Gaussian form, this adds complexity and is often overlooked by the researcher. (How often do you see measures of skewness and kurtosis? How often are these issues rectified, let alone addressed?) Bayesian models do not require Gaussian data. Thus, a researcher can identify accurate predictions without having to worry about these mathematical concerns.
How Does a Bayesian Analysis Work?
Getting to the practical nuts and bolts, let’s use the example of analyzing the relationship between education and income in the context of a Bayesian statistical model. To keep things simple, let’s look for a linear model that describes the relationship. We can create a model in the form Y = aX + b where X is years of education and Y is dollars of annual income. The coefficients a and b are values we want our model to predict.
Bayesian statistics uses a different language than what is taught in basic stats courses, so let’s start with some definitions.
The prior:
Loosely speaking, the prior is what we know (or assume) before we start researching. There is no concept of a prior in standard NHST. A researcher’s prior knowledge may inform the development of a hypothesis, but generally speaking this knowledge has no relation to the actual statistical model. In a Bayesian model, the statistician must always start with something. The prior could be a summary of previous research, but more often Bayesian statisticians will use either a flat prior or a loosely predictive prior so as to leave as much room in the model for the data to speak for itself. However, building a prior based on expected results can also be a powerful way to apply new findings to an existing understanding. The prior is constructed in the form of a distribution of all plausible values of the model’s variables.
The data:
Statistics is all about interpreting data. Thus, any statistician is familiar with this concept. Data are the observations made through whatever scientific research has been done. In the case of our example, data will be in the form of real-world numbers sampled from the population, quantitatively identifying individuals’ education level (in years) and income level (in dollars).
The posterior:
When a Bayesian model is estimated, the result is called the posterior. The result of a Bayesian model is not a single value, it is a probability distribution across all plausible values of the model’s variables. Technically speaking, the posterior indicates the probability of the prior, given the data.
Making the model go.
If you haven’t caught on yet, a Bayesian model applies Bayes’ Theorem. Recall that Bayes’ Theorem provides a formula for calculating the probability of some outcome given some other outcome. To wit, we will calculate the probability of the prior, given the data.
In theory, the prior might be defined as a continuous function, describing the plausibility of every value of a and b. As such, we would expect the posterior as well to be continuous, showing the probability values for any value, given the observed data. In reality, however, math sometimes gets in the way. Depending on the complexity of the model, it is generally more practical to estimate the prior by, say, constructing 1000 possible values of a and b, with associated values of plausibility. Thus, the posterior will be a similarly structured data set of a and b values, with a probability assigned to each.
Once we run the model, we can sample from the posterior. Or, in the case of the estimated distribution just described, we can simply use the posterior in its entirety to plot the distribution. Take the mean of the distribution, and we have an estimate for the values of a and b.
Assigning a prior.
As you can see, only half of the inputs into our Bayesian model are data. The other half come from the prior. (Note that when I say “half” I don’t mean it in any quantitative sense. The actual weight of each input depends on many factors.) The importance of the prior diminishes as the quantity of data increases, so for a sufficiently large sample the prior is less consequential. But, having a good prior means your model can be more effective with a smaller sample.
What is a good prior? It depends on the specifics, but a loosely predictive prior usually gives the model the best start without overly limiting what the data can contribute.
A flat prior would include every possible value for the model’s variables with equal plausibility. This would usually be sufficient to make your model work, but it might waste a lot of time considering possible values that are highly unlikely. Your model would not converge as quickly on valid results.
On the other side of possibilities, a highly predictive prior would limit the model to more likely values. For example, we might look at previous research that showed there to be a strong positive effect of education on income. We could assume that coefficient a has a positive value between $1000 and $4000. This may be accurate, but the trouble with this kind of prior is that if the data predicts something outside of our range of possibilities, we would never know.
So, let’s look at a sweet spot in the middle. Create a prior that is minimally predictive: it focuses on reasonably likely values without being any more restrictive than necessary. In this case, we may limit the values to ranges that make sense. For example, we can assume that income is not negative, so the coefficient b might only have values greater than zero. We also know that the average income is, for the sake of argument, $38,000. So our plausible values for b in the prior could also have a mean of $38,000. Values for coefficient a might be positive or negative, but any value greater than $38,000 would be nonsensical because surely a single year of education does not account for that much variation in income. Thus, we might decide on a distribution for a with a mean of 0 and standard deviation of 10,000. Constructing a prior in this way provides very little to the model in the way of assumptions or limitations, but it focuses the model on values that are not impossible or highly unlikely.
Interpreting Results
In the example above, our statistical model is based on a linear equation. In the standard way statistics are done, we would use OLS regression to estimate values for a and b. Each coefficient would have a corresponding significance value, interpreted as p. If p < 0.05 we would say our model has statistical significance, we would be pleased with our findings, and our paper would be more likely to get published. This series of events might be problematic for a number of reasons, but for now let’s look at how the Bayesian model gives us different information to work with.
Once the posterior has been identified, we do not immediately have values for our coefficients and significance. Remember, the posterior is a distribution. The posterior provides likelihoods for all plausible values of our coefficients. If we want to estimate the values, doing so is simple enough by calculating the mean of the distribution. We could also calculate the standard deviation and from that calculate something equivalent to p. But doing so, and stopping there, would miss the benefit of using a Bayesian approach.
In Bayesian data analysis, the point of the work is not just to identify whether or not a particular model is significant. Rather, Bayesian analyses help us to identify the best model, the model that is most likely to be predictive, given the data we observed. In our simple example, we could plot the posterior distribution for our coefficient a to see how it is actually distributed. Recall this is the slope in our linear equation, or the effect of education on income. Does it look like a bell curve? Are there multiple values (rather than one central mean) that are more likely? Are there ranges of values that are highly unlikely? We can answer questions like, “what is the likelihood that the effect is greater than $1000?” Or, “is it more likely that the effect is less than $1000 or greater than $5000?”
Additionally, a “venerable tradition” is Bayesian analyses provides a straightforward way to compare models. For example, in addition to the linear model above, we could also consider a quadratic model, or even a model equivalent to the null hypothesis. We could then calculate the Bayes Factor to identify whether one model has more support than another.
But is it Significant?
Performing Bayesian data analysis and using these tools effectively requires a different approach to statistics. We ask different questions. We do not seek to reject anything, and we do not suggest that there is some objective line in the sand which marks the boundary of statistical significance. Bayesian analyses are affirmative and broad. They are more complicated than NHST, to be certain, but that is understandable because statistics themselves are complicated. The Bayesian approach does not require us to simplify our models or to make assumptions about our data. The complexity encourages us to think beyond the binary of NHST, and it requires us to use statistics as a point of argument rather than as its conclusion.
The tools of Bayesian data analysis do not provide you with a neat p-value which, if below a threshold value, allows you to say “my results are significant, therefore my theory has been right from the beginning.” (But please don’t say that regardless.) Instead, the tools allow for a more nuanced and complete interpretation of data. A particular set of data may tell us a great deal, or it may provide very little. It may confirm our assumptions, or it may suggest that our previous results were not at all what we thought. But, in the Bayesian tradition, there is no binary between significance and non-significance. There is always something to say.
References
Instead of a formal paper with endnotes and citations, I am presenting this as an informal essay more like study notes. So, if you’d like to do more studying on this topic, here are some great places to start.
This essay came out of a directed study at Wayne State University. Among other things, I read a couple excellent texts on statistics. The first covers everything a beginner would want to know (and much more) about Bayesian data analysis; the second discusses problematic trends within the scientific status quo (many of which are related to statistical analyses):
McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 2nd ed. CRC Press.
Ritchie, Stuart. 2020. Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth. Metropolitan Books.
For some other good articles about the “replicability crisis” in the social sciences, here are some that I have found useful. These do not delve into options available through Bayesian data analysis, but they do address in depth some of the problems arising from traditional analytic approaches:
Brauer, Jonathan R., Jacob C. Day, and Brittany M. Hammond. 2019. “Do Employers ‘Walk the Talk’ After All? An Illustration of Methods for Assessing Signals in Underpowered Designs.” Sociological Methods & Research. doi: 10.1177/0049124119826158.
Freese, Jeremy, and David Peterson. 2017. “Replication in Social Science.” Annual Review of Sociology. 43:147-65.
Gelman, Andrew, and John Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9(6):641–51. doi: 10.1177/1745691614551642.
Lynn, Michael. 2018. “Are Published Techniques for Increasing Service-Gratuities/Tips Effective? P-Curving and R-Indexing the Evidence.” International Journal of Hospitality Management 69:65–74.
Finally, here are some other articles that describe Bayesian data analysis:
Ortega, Alonso, and Gorka Navarrete. 2017. “Bayesian Hypothesis Testing: An Alternative to Null Hypothesis Significance Testing (NHST) in Psychology and Social Sciences.” IntechOpen. doi: 10.5772/intechopen.70230.
Leave a Reply