Panel Data Analysis IV¶
In this section we estimate a statistical model that leverages some of the main advantages of using panel data: Random Effects. We show some examples of how to estimate and interpret this model, and reflect on the conditions under which the model is appropriate.
Quick reminder¶
Let’s briefly recap some essential concepts regarding panel data:
Two sources of variation (Gould, n.d.):
Cross-section information on differences between units
Time series information on differences over time within units
Pooled OLS and Between Effects models only allow us to examine differences between units. Fixed Effects only allow us to examine differences within units.
What if we wanted a model that leveraged aspects of the Between Effects and Fixed Effects estimators? Such a model would give us:
More variation with which to explain the outcome.
More flexibilty (e.g., include observed time-invariant explanatory variables).
Other methodological and modelling benefits (e.g., decomposing explanatory variables into within and between effects).
Defining our statistical model¶
Recall the general form of Fixed Effects and Random Effects models:
In equation 1.8. we have introduced a unit-specific term to represent some of the residual variation in the outcome that is unexplained by the explanatory variables.
This term (\(\mu_{i}\)) captures the effect of residual heterogeneity on the outcome i.e., unobserved or immeasurable characteristics of the units that are associated with the outcome (and possibly the explanatory variables), and vary across units.
A word of caution¶
Note the lack of a time subscript t in the new term \(\mu_{i}\). The implication is that the unobserved unit-specific effect is constant over time (i.e., within units).
Therefore Fixed Effects and Random Effects models only control for omitted variables that do not change within units (e.g., race, sex at birth, natural ability).
Random Effects Model¶
Conceptualising the Random Effects model¶
The Random Effects model focuses on how changes in explanatory variables are associated with changes in the outcome within and between units.
It assumes the observed explanatory variables and unobserved unit-specific effects are not correlated. In essence: the unobserved unit-specific effect explains some of the variation in the outcome, but is not associated with any of the observed explanatory variables.
Gayle and Lambert (2018, p. 68):
…the random effects panel model is a matrix weighted average of the within-effects (fixed effects) and the between effects.
In essence the Random Effects model borrows some information from the Between Effects model and some from the Fixed Effects model.
Therefore the coefficients in a Random Effects model allow you to speak in terms of the effect of an explanatory variable on an outcome, whether we are comparing different individuals or different observations for the same individual - we’ll see what this means when we estimate our first Random Effects model.
The Random Effects model is specified as follows:
Where:
\(\upsilon_{i}\) represents the unit-specific effect on the outcome.
As we assume the observed explanatory variables and unobserved unit-specific effects are not correlated, there is no need to estimate \(\upsilon_{i}\) as if it were an explanatory variable and hence why it is part of the error component of the model.
Remember, it would only need to be estimated in the model if it would alter the coefficients for the observed explanatory variables: we assume it wouldn’t it. Therefore instead of estimating the value of \(\upsilon_{i}\) using the data (as we would for the observed explanatory variables), we assume the unit-specific effects are drawn from a known probability distribution (Gayle and Lambert, 2018).
As a result, we are only interested in the variance of \(\upsilon_{i}\) and the extent to which it accounts for variation in the outcome.
Final thoughts on conceptualisation¶
In a Random Effects model we are unconcerned with estimating the coefficient of the unit-specific effect. We simply want to know to what degree variation in these unit-specific effects is associated with variation in the outcome.
Estimation¶
use "./data/charity-panel-analysis-2020-09-10.dta", clear
(Contains annual accounts of charities in E&W for financial years 2006-2017)
xtreg linc orgage localc west genchar nsources govern_share, re
Random-effects GLS regression Number of obs = 23,826
Group variable: regno Number of groups = 2,166
R-sq: Obs per group:
within = 0.0135 min = 11
between = 0.0888 avg = 11.0
overall = 0.0832 max = 11
Wald chi2(6) = 507.12
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
linc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
orgage | .005022 .0003686 13.62 0.000 .0042995 .0057446
localc | -.3323159 .0412995 -8.05 0.000 -.4132615 -.2513704
west | .0797179 .0801801 0.99 0.320 -.0774323 .2368681
genchar | -.2729578 .0414011 -6.59 0.000 -.3541025 -.1918131
nsources | .030451 .00276 11.03 0.000 .0250415 .0358605
govern_share | .001024 .000121 8.47 0.000 .000787 .0012611
_cons | 15.15988 .048456 312.86 0.000 15.06491 15.25485
-------------+----------------------------------------------------------------
sigma_u | .90700183
sigma_e | .2821005
rho | .91179586 (fraction of variance due to u_i)
------------------------------------------------------------------------------
QUESTION TIME
How much of the variation in the outcome is accounted for by the model? Is this a lot?
Why are
localc
,west
andgenchar
included in the estimation of the model (when they weren’t in the Fixed Effects model)?What does the \(\text{rho}\) statistic tell us?
Is there evidence of correlation between the unit-specific effects and observed explanatory variables?
Interpretation¶
The effect of the observed explanatory variables is not net of the unit-specific effects. That is, we haven’t controlled away any correlation between X and \(\mu_{i}\) (because we assume they are not correlated).
\(\text{_cons}\) is the intercept.
\(\text{orgage}\) is the predicted change in the outcome for a one-unit increase in organisational age, whether age changes within or between units.
\(\text{rho}\) is the proportion of unexplained variance in the outcome explained by unobserved differences between charities (the unit-specific effects), rather than changes within them.
If \(\text{rho}\) > .5 then most of the residual variation in the outcome is due to differences between units, if \(\text{rho}\) < .5 then most of the residual variation is accounted for by differences within units (i.e., the effects of the explanatory variables).
\(\text{corr(u_i, X) = 0}\) is the formal statement of the (untested) assumption that there is no correlation between the unit-specific effect and the observed explanatory variables in the model.
\(\text{sigma_u}\) (or \(\sigma_u\)) is the standard deviation of the fixed effects (i.e., the residuals within units.
\(\text{sigma_e}\) (or \(\sigma_e\)) is the standard deviation of residuals ei.
\(\text{R-sq: overall}\) is the proportion of variance explained by the observed explanatory variables (i.e., excluding the unit-specific effect).
Post-estimation¶
Though it’s very rarely of substantive interest, we can recover the unit-specific effects (and other parameter estimates) after estimating a Random Effects model:
capture predict random, u
capture predict y_hat, xb
capture predict ei, e
capture predict residuals, ue
capture egen pickone = tag(regno)
l regno fin_year random if pickone in 1/100, clean
regno fin_year random
1. 200048 2006-07 -1.144261
12. 200051 2006-07 1.582299
23. 200069 2006-07 -.21995
34. 200222 2006-07 -.199622
45. 200424 2006-07 -.1241415
56. 200431 2006-07 -.2140733
67. 200500 2006-07 .2228235
78. 201081 2006-07 .2918403
89. 201321 2006-07 -1.13723
100. 201911 2006-07 -.6447842
l regno fin_year y_hat residuals random ei in 1/11, clean
regno fin_year y_hat residuals random ei
1. 200048 2006-07 15.34991 -1.348024 -1.144261 -.2037623
2. 200048 2007-08 15.35493 -1.177057 -1.144261 -.0327961
3. 200048 2008-09 15.39064 -1.205535 -1.144261 -.0612741
4. 200048 2009-10 15.36498 -1.132381 -1.144261 .0118806
5. 200048 2010-11 15.3716 -1.200694 -1.144261 -.0564324
6. 200048 2011-12 15.37502 -1.22701 -1.144261 -.0827486
7. 200048 2012-13 15.41329 -1.037294 -1.144261 .1069672
8. 200048 2013-14 15.38507 -1.085107 -1.144261 .0591543
9. 200048 2014-15 15.42087 -1.160563 -1.144261 -.0163015
10. 200048 2015-16 15.39511 -1.09398 -1.144261 .0502813
11. 200048 2016-17 15.40013 -1.029922 -1.144261 .1143396
di -1.144261 + -.2037623
-1.3480233
tabstat random ei, s(mean sd) format(%5.4f)
stats | random ei
---------+--------------------
mean | 0.0000 0.0000
sd | 0.9096 0.2691
------------------------------
Benefits of Random Effects¶
Analyse both change over time and differences between units.
Control for residual heterogeneity.
Estimate observed time-invariant explanatory variables in the model.
Coefficient estimates are efficient, especially when compared to those from a Fixed Effects model. As a result, standard errors tend to be smaller. Put simply, the estimates of the coefficients are based on more information than those in the Fixed Effects model, which bases its estimates on only one source of variation (within).
Limitations of Random Effects¶
Key assumption is often unrealistic.
Coefficient estimates are inconsistent if the key assumption is violated.
That is, if the coefficients for the observed explanatory variables are biased, then increasing the sample size does not necessarily mean we are getting closer to the true value of the parameter. Difficult to infer whether the value of a coefficient is mainly determined by within or between variation (though there are solutions to this problem).
Cannot control for unobserved residual heterogeneity that varies over time e.g., educational ability? Natural resilience?
The last point is worth expanding on: if units differ in an unobserved way that varies over time, this will not be controlled for in the Random Effects model.
Summarising the Random Effects model¶
Analyses both change within a unit’s outcomes, and differences between units’ outcomes.
Can control for the effect of unobserved time-invariant explanatory variables (residual heterogeneity).
Can include observed explanatory variables that do not vary within units (e.g., race, sex at birth).
Does not provide robust estimates of observed explanatory variables when said variables are correlated with unobserved unit-specific effects.
Summary¶
Both the Pooled OLS and Between Effects models provide useful information on the association between an outcome Y and a set of explanatory variables X.
Fixed Effects provide potentially different information on the association between an outcome Y and a set of explanatory variables *X.
Random Effects combines the within and between perspectives - methodological and substantive benefits.