Skip to main content

Understanding how Anova relates to regression

Analysis of variance (Anova) models are a special case of multilevel regression models, but Anova, the procedure, has something extra: structure on the regression coefficients.

As I put it in the rejoinder for my 2005 discussion paper:

ANOVA is more important than ever because we are fitting models with many parameters, and these parameters can often usefully be structured into batches. The essence of “ANOVA” (as we see it) is to compare the importance of the batches and to provide a framework for efficient estimation of the individual parameters and related summaries such as comparisons and contrasts. . . .

A statistical model is usually taken to be summarized by a likelihood, or a likelihood and a prior distribution, but we go an extra step by noting that the parameters of a model are typically batched, and we take this batching as an essential part of the model. . . .

A key technical contribution of our paper is to disentangle modeling and inferential summaries. A single multilevel model can yield inference for finite-population and superpopulation inferences. . . .

I summarize:

First, if you are already fitting a complicated model, your inferences can be better understood using the structure of that model.Second, if you have a complicated data structure and are trying to set up a model, it can help to use multilevel modeling—not just a simple units-within-groups structure but a more general approach with crossed factors where appropriate. . . .

I’m sharing this with you now because Josh Miller pointed me to this webpage by Jonas Kristoffer Lindeløv entitled “Common statistical tests are linear models (or: how to teach stats).”

Lindeløv’s explanations are good, and I do think it’s useful for students and practitioners to understand that all these statistical procedures are based on the same class of underlying model. He also notes that the Wilcoxon rank test can be formulated approximately as a linear model on ranks, a point that we put in BDA and which I’ve occasionally blogged (see here and here). It’s good to see these ideas being rediscovered: they’re useful enough that they shouldn’t be trapped within a single book and a few old blog entries.

The point of my post today is to emphasize that it’s not just what model you fit, it’s also how you summarize it. To put it another way, I think the unification of statistical comparisons is taught to everyone in econometrics 101, and indeed this is a key theme of my book with Jennifer, in that we use regression as an organizing principle for applied statistics. (Just to be clear, I’m not claiming that we discovered this. Quite the opposite. I’m saying that we constructed our book in large part based on the understanding we’d gathered from basic ideas in statistics and econometrics that we felt had not fully been integrated into how this material was taught.)

So, it’s well known that all these models are a special case of regression, and that’s why in a good econometrics class they won’t bother teaching Anova, chi-squared tests, etc., they just do regression. My Anova paper demonstrates how the concept of Anova has value, not just from the model (which is just straightforward multilevel linear regression) but because of the structured way the fits are summarized.

For more, go to my Anova article or, for something quicker, these old blog posts:
Anova for economists
A psychology researcher asks: Is Anova dead?
Anova is great—if you interpret it as a way of structuring a model, not if you focus on F tests.

I think these are important points: the connection between the statistical models, and also the extra understanding that arises from batching and summarizing by batch.



from Statistical Modeling, Causal Inference, and Social Science https://ift.tt/2uCmk8t
via IFTTT

Comments

Popular posts from this blog

Explaining models with Triplot, part 1

[This article was first published on R in ResponsibleML on Medium , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here ) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Explaining models with triplot, part 1 tl;dr Explaining black box models built on correlated features may prove difficult and provide misleading results. R package triplot , part of the DrWhy.AI project, is aiming at facilitating the process of explaining the importance of the whole group of variables, thus solving the problem of correlated features. Calculating the importance of explanatory variables is one of the main tasks of explainable artificial intelligence (XAI). There are a lot of tools at our disposal that helps us with that, like Feature Importance or Shapley values, to name a few. All these methods calculate individual feature importance for each variable separately. The problem arises when features used ...

The con behind every wedding

With her marriage on the rocks, one writer struggles to reconcile her cynicism about happily-ever-after as her own children rush to tie the knot A lavish wedding, a couple in love; romance was in the air, as it should be when two people are getting married. But on the top table, the mothers of the happy pair were bonding over their imminent plans for … divorce. That story was told to me by the mother of the bride. The wedding in question was two summers ago: she is now divorced, and the bridegroom’s parents are separated. “We couldn’t but be aware of the crushing irony of the situation,” said my friend. “There we were, celebrating our children’s marriage, while plotting our own escapes from relationships that had long ago gone sour, and had probably been held together by our children. Now they were off to start their lives together, we could be off, too – on our own, or in search of new partners.” Continue reading... The Guardian http://ift.tt/2xZTguV October 07, 2017 at 09:00AM