Extensions¶
In this section we provide a whistle-stop tour of some additional techniques and approaches for panel data and longitudinal data more broadly.
Nonlinear outcomes¶
Fixed Effects and Random Effects models can be applied to nonlinear outcomes (e.g., binary and count dependent variables) also.
Here is a published example from McDonnell (2017): https://doi.org/10.1177/0899764017692039
use "./data/improvingcharityaccountability_20170411.dta", clear
gen localc = (geographicalspread==2)
gen linc = ln(totalfunds) if totalfunds > 0 & totalfunds!=.
(Scottish Charity Financial Exceptions Data: 2007-2013)
(1,323 missing values generated)
tab yearsubmitted excgroup_3
Year | Possible failure to
annual | apply funds for
return | charitable purposes
submitted | 0 1 | Total
-----------+----------------------+----------
2007 | 754 196 | 950
2008 | 2,752 881 | 3,633
2009 | 2,964 818 | 3,782
2010 | 2,946 736 | 3,682
2011 | 2,659 702 | 3,361
2012 | 2,450 645 | 3,095
2013 | 1,555 457 | 2,012
2014 | 585 222 | 807
-----------+----------------------+----------
Total | 16,665 4,657 | 21,322
xtlogit excgroup_3 concentration charityage localc linc, or re
Fitting comparison model:
Iteration 0: log likelihood = -10687.029
Iteration 1: log likelihood = -10488.084
Iteration 2: log likelihood = -10486.442
Iteration 3: log likelihood = -10486.442
Fitting full model:
tau = 0.0 log likelihood = -10486.442
tau = 0.1 log likelihood = -10257.949
tau = 0.2 log likelihood = -10048.82
tau = 0.3 log likelihood = -9859.3094
tau = 0.4 log likelihood = -9689.005
tau = 0.5 log likelihood = -9538.3489
tau = 0.6 log likelihood = -9410.3167
tau = 0.7 log likelihood = -9314.0716
tau = 0.8 log likelihood = -9277.005
Iteration 0: log likelihood = -9313.6924
Iteration 1: log likelihood = -9178.5858
Iteration 2: log likelihood = -9173.5625
Iteration 3: log likelihood = -9173.5365
Iteration 4: log likelihood = -9173.5365 (backed up)
Iteration 5: log likelihood = -9173.5362
Random-effects logistic regression Number of obs = 19,982
Group variable: org_id Number of groups = 4,714
Random effects u_i ~ Gaussian Obs per group:
min = 1
avg = 4.2
max = 7
Integration method: mvaghermite Integration pts. = 12
Wald chi2(4) = 232.05
Log likelihood = -9173.5362 Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
excgroup_3 | OR Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
concentration | .8526391 .128419 -1.06 0.290 .6346916 1.145428
charityage | .9911255 .0015465 -5.71 0.000 .9880991 .9941612
localc | 2.33282 .2335039 8.46 0.000 1.917256 2.838458
linc | 1.333225 .0276658 13.86 0.000 1.280089 1.388567
_cons | .0050338 .0013577 -19.62 0.000 .002967 .0085406
--------------+----------------------------------------------------------------
/lnsig2u | 1.518384 .0552301 1.410135 1.626633
--------------+----------------------------------------------------------------
sigma_u | 2.136549 .0590009 2.023984 2.255376
rho | .5811599 .0134437 .5546033 .6072544
-------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 2625.81 Prob >= chibar2 = 0.000
use "./data/charity-panel-analysis-2020-09-10.dta", clear
xtpoisson nsources linc orgage localc west genchar govern_share, re
(Contains annual accounts of charities in E&W for financial years 2006-2017)
Fitting Poisson model:
Iteration 0: log likelihood = -42473.77
Iteration 1: log likelihood = -42473.77
Fitting full model:
Iteration 0: log likelihood = -43378.386
Iteration 1: log likelihood = -41912.848 (not concave)
Iteration 2: log likelihood = -41494.954
Iteration 3: log likelihood = -41471.918
Iteration 4: log likelihood = -41471.687
Iteration 5: log likelihood = -41471.687
Random-effects Poisson regression Number of obs = 23,826
Group variable: regno Number of groups = 2,166
Random effects u_i ~ Gamma Obs per group:
min = 11
avg = 11.0
max = 11
Wald chi2(6) = 231.78
Log likelihood = -41471.687 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
nsources | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
linc | .0439247 .0057484 7.64 0.000 .0326581 .0551913
orgage | .0002729 .000149 1.83 0.067 -.0000192 .0005649
localc | .0063181 .0127513 0.50 0.620 -.0186741 .0313103
west | -.0484527 .0247255 -1.96 0.050 -.0969138 8.46e-06
genchar | .0671619 .0134216 5.00 0.000 .0408561 .0934677
govern_share | .0016258 .0001569 10.36 0.000 .0013183 .0019333
_cons | .5776679 .0894589 6.46 0.000 .4023317 .7530042
-------------+----------------------------------------------------------------
/lnalpha | -2.928772 .0456227 -3.01819 -2.839353
-------------+----------------------------------------------------------------
alpha | .0534627 .0024391 .0488896 .0584635
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 2004.16 Prob >= chibar2 = 0.000
Hybrid panel data models¶
A hybrid panel model allows you to decompose the observed explanatory variables into their within and between effects using the Random Effects estimator.
Let’s return to our charity data example and see if we can decompose the effect of nsources
into its within and between effects.
use "./data/charity-panel-analysis-2020-09-10.dta", clear
(Contains annual accounts of charities in E&W for financial years 2006-2017)
bys regno: egen nsources_mn = mean(nsources)
gen nsources_delta = nsources - nsources_mn
xtreg linc orgage localc west genchar nsources_mn nsources_delta govern_share, re
Random-effects GLS regression Number of obs = 23,826
Group variable: regno Number of groups = 2,166
R-sq: Obs per group:
within = 0.0136 min = 11
between = 0.1017 avg = 11.0
overall = 0.0952 max = 11
Wald chi2(7) = 536.49
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
linc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
orgage | .0048981 .0003692 13.27 0.000 .0041745 .0056216
localc | -.3320839 .0412748 -8.05 0.000 -.412981 -.2511868
west | .1011212 .0802314 1.26 0.208 -.0561294 .2583718
genchar | -.3070555 .0418617 -7.34 0.000 -.3891029 -.2250082
nsources_mn | .1298578 .0187345 6.93 0.000 .0931388 .1665767
nsources_de~a | .028249 .0027888 10.13 0.000 .022783 .0337149
govern_share | .0010022 .000121 8.29 0.000 .0007652 .0012393
_cons | 14.80668 .0817325 181.16 0.000 14.64648 14.96687
--------------+----------------------------------------------------------------
sigma_u | .90698522
sigma_e | .2821005
rho | .91179291 (fraction of variance due to u_i)
-------------------------------------------------------------------------------
The coefficients for nsources_mn
and nsources_delta
are equal to those estimated in the Between Effects and Fixed Effects models respectively.
Furthermore we can test whether the between and within effects are equal:
test nsources_mn = nsources_delta
( 1) nsources_mn - nsources_delta = 0
chi2( 1) = 28.78
Prob > chi2 = 0.0000
An equivalent approach is to use the mundlak
command:
mundlak linc orgage localc west genchar nsources govern_share, hybrid
The variable orgage does not vary sufficiently within groups and will not be use
> d to create additional regressors.
0% of the total variance in orgage is within groups.
The variable localc does not vary sufficiently within groups and will not be use
> d to create additional regressors.
0% of the total variance in localc is within groups.
The variable west does not vary sufficiently within groups and will not be used
> to create additional regressors.
0% of the total variance in west is within groups.
The variable genchar does not vary sufficiently within groups and will not be us
> ed to create additional regressors.
0% of the total variance in genchar is within groups.
+------------------------------------------------+
| Variable | RE | Hybrid |
|----------------------+------------+------------|
| orgage | 0.005 | 0.005 |
| localc | -0.332 | -0.329 |
| west | 0.080 | 0.097 |
| genchar | -0.273 | -0.295 |
| nsources | 0.030 | |
| govern_share | 0.001 | |
| diff__nsources | | 0.028 |
| diff__govern_share | | 0.001 |
| mean__nsources | | 0.134 |
| mean__govern_share | | 0.000 |
| _cons | 15.160 | 14.797 |
|----------------------+------------+------------|
| N | 23826 | 23826 |
| N_g | 2166.000 | 2166.000 |
| g_min | 11.000 | 11.000 |
| g_avg | 11.000 | 11.000 |
| g_max | 11.000 | 11.000 |
| rho | 0.912 | 0.912 |
| rmse | 0.282 | 0.282 |
| chi2 | 507.117 | 537.048 |
| p | 0.000 | 0.000 |
| df_m | 6.000 | 8.000 |
| sigma | 0.950 | 0.950 |
| sigma_u | 0.907 | 0.907 |
| sigma_e | 0.282 | 0.282 |
| r2_w | 0.014 | 0.014 |
| r2_o | 0.083 | 0.095 |
| r2_b | 0.089 | 0.102 |
+------------------------------------------------+
Mundlak approach¶
Random Effects model assumes that observed and unobserved effects are uncorrelated - an often unrealistic assumption (Gayle and Lambert, 2018).
We can relax this assumption using the Mundlak approach, which works by including unit-level means for the time-varying explanatory variables in the Random Effects model.
bys regno: egen orgage_mn = mean(orgage)
bys regno: egen govern_share_mn = mean(govern_share)
xtreg linc orgage localc west genchar nsources govern_share ///
govern_share_mn nsources_mn orgage_mn, re
est store mund
Random-effects GLS regression Number of obs = 23,826
Group variable: regno Number of groups = 2,166
R-sq: Obs per group:
within = 0.0140 min = 11
between = 0.1042 avg = 11.0
overall = 0.0976 max = 11
Wald chi2(9) = 557.99
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
-------------------------------------------------------------------------------
linc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
orgage | .0069072 .0005802 11.90 0.000 .00577 .0080444
localc | -.3282906 .0414364 -7.92 0.000 -.4095044 -.2470768
west | .1167392 .0805284 1.45 0.147 -.0410935 .2745719
genchar | -.3210918 .0449127 -7.15 0.000 -.4091191 -.2330644
nsources | .0289886 .0027931 10.38 0.000 .0235142 .034463
govern_share | .0010325 .0001225 8.43 0.000 .0007923 .0012726
govern_shar~n | -.0007575 .0007578 -1.00 0.317 -.0022428 .0007277
nsources_mn | .109471 .0195752 5.59 0.000 .0711044 .1478376
orgage_mn | -.0034024 .0007524 -4.52 0.000 -.0048772 -.0019277
_cons | 14.85058 .0835091 177.83 0.000 14.68691 15.01426
--------------+----------------------------------------------------------------
sigma_u | .90700183
sigma_e | .2821005
rho | .91179586 (fraction of variance due to u_i)
-------------------------------------------------------------------------------
quietly xtreg linc orgage localc west genchar nsources govern_share, fe
est store fixed
est table fixed mund
----------------------------------------
Variable | fixed mund
-------------+--------------------------
orgage | .0069072 .0069072
localc | (omitted) -.3282906
west | (omitted) .11673923
genchar | (omitted) -.32109178
nsources | .02898861 .02898861
govern_share | .00103247 .00103247
govern_sha~n | -.00075753
nsources_mn | .10947099
orgage_mn | -.00340245
_cons | 14.715042 14.850581
----------------------------------------
quietly xtreg linc orgage localc west genchar nsources govern_share ///
govern_share_mn nsources_mn orgage_mn, re
test govern_share_mn = nsources_mn = orgage_mn
( 1) govern_share_mn - nsources_mn = 0
( 2) govern_share_mn - orgage_mn = 0
chi2( 2) = 43.79
Prob > chi2 = 0.0000
The Mundlak approach is an alternative to the Hausman test.
Dynamic panel models¶
The models are suitable for when you have repeated contacts data and your (lagged) outcome variable serves also serves as one of your explanatory variables.
The inclusion of lagged outcome variables poses as an issue as the lagged variables are possibly correlated with the unobserved effects (Gayle and Lambert, 2018).
use "./data/charity-panel-analysis-2020-09-10.dta", clear
xtset regno fin_year
(Contains annual accounts of charities in E&W for financial years 2006-2017)
panel variable: regno (strongly balanced)
time variable: fin_year, 1 to 11
delta: 1 unit
capture gen linc_lag = L.linc
l regno fin_year linc linc_lag in 1/22, clean
regno fin_year linc linc_lag
1. 200048 2006-07 14.00189 .
2. 200048 2007-08 14.17788 14.00189
3. 200048 2008-09 14.1851 14.17788
4. 200048 2009-10 14.2326 14.1851
5. 200048 2010-11 14.1709 14.2326
6. 200048 2011-12 14.14801 14.1709
7. 200048 2012-13 14.376 14.14801
8. 200048 2013-14 14.29996 14.376
9. 200048 2014-15 14.26031 14.29996
10. 200048 2015-16 14.30113 14.26031
11. 200048 2016-17 14.37021 14.30113
12. 200051 2006-07 17.664 .
13. 200051 2007-08 17.60568 17.664
14. 200051 2008-09 17.44065 17.60568
15. 200051 2009-10 16.46766 17.44065
16. 200051 2010-11 16.32526 16.46766
17. 200051 2011-12 16.4079 16.32526
18. 200051 2012-13 16.35779 16.4079
19. 200051 2013-14 16.04346 16.35779
20. 200051 2014-15 15.71779 16.04346
21. 200051 2015-16 15.42241 15.71779
22. 200051 2016-17 15.51123 15.42241
xtreg linc orgage localc west genchar nsources govern_share linc_lag, re
Random-effects GLS regression Number of obs = 21,660
Group variable: regno Number of groups = 2,166
R-sq: Obs per group:
within = 0.2673 min = 10
between = 0.9963 avg = 10.0
overall = 0.9320 max = 10
Wald chi2(7) = 296585.19
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
linc | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
orgage | -.0000262 .0000438 -0.60 0.549 -.000112 .0000596
localc | -.0065562 .0038027 -1.72 0.085 -.0140093 .0008968
west | .0009973 .007296 0.14 0.891 -.0133026 .0152973
genchar | -.005033 .0040519 -1.24 0.214 -.0129746 .0029087
nsources | .0102149 .0014779 6.91 0.000 .0073183 .0131116
govern_share | -.0001568 .0000584 -2.69 0.007 -.0002712 -.0000424
linc_lag | .9719555 .001881 516.72 0.000 .9682687 .9756422
_cons | .4086979 .028964 14.11 0.000 .3519294 .4654663
-------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .23554516
rho | 0 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Note how large the coefficient is for the lagged variable (and how much smaller the others have become). This is a common issue when including lagged outcome variables as one of the explanatory variables i.e., the lagged variable soaks up all of the variation accounted for by the unobserved unit-specific effects.
xtreg linc orgage localc west genchar nsources govern_share linc_lag, fe
note: localc omitted because of collinearity
note: west omitted because of collinearity
note: genchar omitted because of collinearity
Fixed-effects (within) regression Number of obs = 21,660
Group variable: regno Number of groups = 2,166
R-sq: Obs per group:
within = 0.2705 min = 10
between = 0.9695 avg = 10.0
overall = 0.9098 max = 10
F(4,19490) = 1807.17
corr(u_i, Xb) = 0.9009 Prob > F = 0.0000
------------------------------------------------------------------------------
linc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
orgage | .0018829 .0005609 3.36 0.001 .0007836 .0029822
localc | 0 (omitted)
west | 0 (omitted)
genchar | 0 (omitted)
nsources | .0211613 .0024664 8.58 0.000 .0163271 .0259956
govern_share | .0005565 .0001088 5.12 0.000 .0003433 .0007698
linc_lag | .5167688 .0062088 83.23 0.000 .5045989 .5289386
_cons | 7.147677 .0953486 74.96 0.000 6.960786 7.334568
-------------+----------------------------------------------------------------
sigma_u | .46351877
sigma_e | .23554516
rho | .79476462 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2165, 19490) = 3.29 Prob > F = 0.0000
A set of dynamic panel models — commonly known as Arrelano-Bond models — have been developed to address the inclusion of a lagged outcome as an explanatory variable.
They also have the advantage of controlling for “initial conditions”.
That is, data collection sometimes interrupts an ongoing social process, and thus the outcome observed at the first time point is partially accounted for factors not measured at first time point (Gayle and Lambert, 2018).
Latent growth curve models¶
Statistical modelling of repeated contacts data.
Focuses on trajectory, trend or growth in an outcome over time within units.
And how these trajectories are linked to observed and unobserved differences between units.
Latent growth curve models can be estimated using a Multilevel modelling framework — random intercepts, random slopes.
They can also be estimated using a Structural Equation Modelling (SEM) framework — there exists underlying continuous trajectory of change that is not directly observed.
Honesty time
[Not an area I know a great deal about - see the reading list for suggested resources]