What Successful Managers, Researchers, and Analysts Know Before Trusting Regression Analysis
Everyone will be relieved to hear that this article will not attempt to cover the proper way to specify regression models. That topic is both too broad and too deep for this forum. Instead, I'll focus on the real-world pitfalls and opportunities that come up most often when looking at, analyzing, and understanding regression studies.
This article is not designed to have the reader come out the other side a regression wizard. In any case, the last thing most quantitative projects need is another cook in the kitchen. Rather, its purpose is to give managers insight into issues that most often make it through the modeling process unchecked so they can be addressed by the analysts themselves.
Note: Depending on the flavor of regression being used, estimated coefficients can take on a wide variety of meanings. I'll do my best to stick to general terms like "effect" for the right side of the equation.
A Model Is A (Set Of) Distributional Claim(s)
Danger: When the independent variables' joint distribution doesn't match the response distribution, estimates become extremely sensitive outside (or even within) the sampled domain & significance tests are unreliable
Solution: Always check & publish the Q-Q Plot
Having estimated the model, he speculates that there may be an additional "kicker" effect of being a black woman that is not captured by the impacts of being either a woman or black. To test this idea, he adds an interaction term to estimate the effect of being both black and a woman on wages.
Hold the phone.
With occupation type already in the model, the interaction term is interpreted as the average effect of being a black woman across occupational fields.
This opens the model up to two possible problem scenarios:
In the first case, the p-value would show the variable is significant because of the lack of observations outside the occupations wherein being a black woman has an impact. Thus, our model would improperly extend the effect to unsampled fields.
The second case is simply a traditional small sample problem: We shouldn't feel comfortable extending the results across all people and fields. In either case, cross-validation would fail to call our results into question because the samples lack counterexamples that likely exist in the larger population.
Danger: Adding complex terms can rapidly reduce the sample size used to estimate effects
Solution: Analysts should explicitly list sample sizes used to estimate each term in the model
Errors Tell A Story
Don't Minimize Errors (They're Important)
Quants know to look at error distributions and magnitudes. They know to check for autocorrelation and heteroskedasticity. However, rarely will an analyst actually map the errors back to the data to see how they evolve over all the data's dimensions throughout the model specification process.
The story of how errors evolve as the model is reformulated and tuned is often summarized by aggregate metrics (Information Criteria; Predicted vs. Actuals; etc.), but mapping the errors to the data allows analysts to be specific about the way in which the model develops ("After adding the term CollegeDegree, the error for predicted wages decreased across the sample except for people who attended trade school"). Narratives like this motivate the progression from one specification to the next.
Using words anyone can understand to intuitively summarize the system as it is being modeled pays dividends when it's time to use the results in anger by providing deep insight with low mental overhead.
I hear you: "Okay...How do I actually do this?"
Use A Storytelling Platform
Personally, I recommend R Markdown, which eliminates the need to deliver results and reporting separately (read: No Slides).
This approach swaps the traditional model-and-a-slide-deck deliverable for institutional knowledge and reproducible research your organization can apply effectively going forward. It scales your group's work, increases your impact, and decreases the long-run cost of analytical development.
Regression Is As Good As Your Counterfactual
"Everything should be made as simple as possible, but not simpler" - Probably Not Einstein
"Dad, I heard this story about a gorilla-"
"We don't speak of 2016, son."
In the real world, simplifying and condensing quantitative work is crucial if our results are to be trusted and relied upon. However, counterfactuals should not be left on the cutting room floor. Those hypotheses that, if true, rebut our models help us understand how sensitive the claims are.
Nate Silver's FiveThirtyEight famously forecast Hillary's 2016 election win for a long time. However, unlike cases of an upset in sports, people were extremely frustrated with the model, and most immediately concluded that the model must be fixed: that it was not specified properly. This knee-jerk reaction springs from a failure to understand the counterfactual. To illustrate this point, I'll refer to Silver's model as if the output was the sum of 50 (one for each state) logistic regression models.
The Electoral College sets up a situation in which candidates should generate just enough votes in each state to win (except Maine and Nebraska) and then concentrate efforts elsewhere. As it happened, Hillary generated votes above and beyond what she needed in states she had already won, while Trump won lots of states by a low margin.
Turning to the simplified Silver model, we see that it was very nearly correct, failing to predict the critical result only slightly - but repeatedly. Complex situations (Electoral College vs. Popular Vote) confer valuable information to our counterfactuals. This model has some multiple of 50 counterfactuals: Each state having several variables that may need a different formulation to properly predict its outcome. One such alternative could posit that the outcome in each state is dependent on the concentration of efforts outside the state. Since the total amount of effort to expend is fixed, resources spent outside each state are considered lost.
At the end of the day, a review of the many model specifications possible due to sensitivities introduced by the Electoral College, one comes to the valuable conclusion that <bad joke> claims should be kept conservative </bad joke>. In Silver's own post mortem of the election forecasts, he says the outcome models that were there at the time, trained on identical data and coming to wildly different conclusions should have been indicative of precisely this sensitivity.
In business, we must take care to understand precisely how sensitive to misspecification our models are, especially when we are comfortable with the projections. The solution is being meticulous about testing competing formulations and fostering this discipline in others.
Managers working with analytics professionals may often and suddenly find themselves feeling technically outclassed. Using this guide, I hope you will feel empowered to ask the questions that analysts and researchers, caught up in their work, often fail to ask themselves.
Copyright © 2018