Extensions¶

In this section we provide a whistle-stop tour of some additional techniques and approaches for panel data and longitudinal data more broadly.

Nonlinear outcomes¶

Fixed Effects and Random Effects models can be applied to nonlinear outcomes (e.g., binary and count dependent variables) also.

Here is a published example from McDonnell (2017): https://doi.org/10.1177/0899764017692039

use "./data/improvingcharityaccountability_20170411.dta", clear

gen localc = (geographicalspread==2)
gen linc = ln(totalfunds) if totalfunds > 0 & totalfunds!=.

(Scottish Charity Financial Exceptions Data: 2007-2013)

(1,323 missing values generated)

tab yearsubmitted excgroup_3

      Year |  Possible failure to
    annual |    apply funds for
    return |  charitable purposes
 submitted |         0          1 |     Total
-----------+----------------------+----------
      2007 |       754        196 |       950 
      2008 |     2,752        881 |     3,633 
      2009 |     2,964        818 |     3,782 
      2010 |     2,946        736 |     3,682 
      2011 |     2,659        702 |     3,361 
      2012 |     2,450        645 |     3,095 
      2013 |     1,555        457 |     2,012 
      2014 |       585        222 |       807 
-----------+----------------------+----------
     Total |    16,665      4,657 |    21,322 

xtlogit excgroup_3 concentration charityage localc linc, or re

Fitting comparison model:

Iteration 0:   log likelihood = -10687.029  
Iteration 1:   log likelihood = -10488.084  
Iteration 2:   log likelihood = -10486.442  
Iteration 3:   log likelihood = -10486.442  

Fitting full model:

tau =  0.0     log likelihood = -10486.442
tau =  0.1     log likelihood = -10257.949
tau =  0.2     log likelihood =  -10048.82
tau =  0.3     log likelihood = -9859.3094
tau =  0.4     log likelihood =  -9689.005
tau =  0.5     log likelihood = -9538.3489
tau =  0.6     log likelihood = -9410.3167
tau =  0.7     log likelihood = -9314.0716
tau =  0.8     log likelihood =  -9277.005

Iteration 0:   log likelihood = -9313.6924  
Iteration 1:   log likelihood = -9178.5858  
Iteration 2:   log likelihood = -9173.5625  
Iteration 3:   log likelihood = -9173.5365  
Iteration 4:   log likelihood = -9173.5365  (backed up)
Iteration 5:   log likelihood = -9173.5362  

Random-effects logistic regression              Number of obs     =     19,982
Group variable: org_id                          Number of groups  =      4,714

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        4.2
                                                              max =          7

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(4)      =     232.05
Log likelihood  = -9173.5362                    Prob > chi2       =     0.0000

-------------------------------------------------------------------------------
   excgroup_3 |         OR   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
concentration |   .8526391    .128419    -1.06   0.290     .6346916    1.145428
   charityage |   .9911255   .0015465    -5.71   0.000     .9880991    .9941612
       localc |    2.33282   .2335039     8.46   0.000     1.917256    2.838458
         linc |   1.333225   .0276658    13.86   0.000     1.280089    1.388567
        _cons |   .0050338   .0013577   -19.62   0.000      .002967    .0085406
--------------+----------------------------------------------------------------
     /lnsig2u |   1.518384   .0552301                      1.410135    1.626633
--------------+----------------------------------------------------------------
      sigma_u |   2.136549   .0590009                      2.023984    2.255376
          rho |   .5811599   .0134437                      .5546033    .6072544
-------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 2625.81                Prob >= chibar2 = 0.000

use "./data/charity-panel-analysis-2020-09-10.dta", clear

xtpoisson nsources linc orgage localc west genchar govern_share, re

(Contains annual accounts of charities in E&W for financial years 2006-2017)


Fitting Poisson model:

Iteration 0:   log likelihood =  -42473.77  
Iteration 1:   log likelihood =  -42473.77  

Fitting full model:

Iteration 0:   log likelihood = -43378.386  
Iteration 1:   log likelihood = -41912.848  (not concave)
Iteration 2:   log likelihood = -41494.954  
Iteration 3:   log likelihood = -41471.918  
Iteration 4:   log likelihood = -41471.687  
Iteration 5:   log likelihood = -41471.687  

Random-effects Poisson regression               Number of obs     =     23,826
Group variable: regno                           Number of groups  =      2,166

Random effects u_i ~ Gamma                      Obs per group:
                                                              min =         11
                                                              avg =       11.0
                                                              max =         11

                                                Wald chi2(6)      =     231.78
Log likelihood  = -41471.687                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
    nsources |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        linc |   .0439247   .0057484     7.64   0.000     .0326581    .0551913
      orgage |   .0002729    .000149     1.83   0.067    -.0000192    .0005649
      localc |   .0063181   .0127513     0.50   0.620    -.0186741    .0313103
        west |  -.0484527   .0247255    -1.96   0.050    -.0969138    8.46e-06
     genchar |   .0671619   .0134216     5.00   0.000     .0408561    .0934677
govern_share |   .0016258   .0001569    10.36   0.000     .0013183    .0019333
       _cons |   .5776679   .0894589     6.46   0.000     .4023317    .7530042
-------------+----------------------------------------------------------------
    /lnalpha |  -2.928772   .0456227                      -3.01819   -2.839353
-------------+----------------------------------------------------------------
       alpha |   .0534627   .0024391                      .0488896    .0584635
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 2004.16              Prob >= chibar2 = 0.000

Hybrid panel data models¶

A hybrid panel model allows you to decompose the observed explanatory variables into their within and between effects using the Random Effects estimator.

Let’s return to our charity data example and see if we can decompose the effect of nsources into its within and between effects.

use "./data/charity-panel-analysis-2020-09-10.dta", clear

(Contains annual accounts of charities in E&W for financial years 2006-2017)

bys regno: egen nsources_mn = mean(nsources)
gen nsources_delta = nsources - nsources_mn

xtreg linc orgage localc west genchar nsources_mn nsources_delta govern_share, re

Random-effects GLS regression                   Number of obs     =     23,826
Group variable: regno                           Number of groups  =      2,166

R-sq:                                           Obs per group:
     within  = 0.0136                                         min =         11
     between = 0.1017                                         avg =       11.0
     overall = 0.0952                                         max =         11

                                                Wald chi2(7)      =     536.49
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-------------------------------------------------------------------------------
         linc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       orgage |   .0048981   .0003692    13.27   0.000     .0041745    .0056216
       localc |  -.3320839   .0412748    -8.05   0.000     -.412981   -.2511868
         west |   .1011212   .0802314     1.26   0.208    -.0561294    .2583718
      genchar |  -.3070555   .0418617    -7.34   0.000    -.3891029   -.2250082
  nsources_mn |   .1298578   .0187345     6.93   0.000     .0931388    .1665767
nsources_de~a |    .028249   .0027888    10.13   0.000      .022783    .0337149
 govern_share |   .0010022    .000121     8.29   0.000     .0007652    .0012393
        _cons |   14.80668   .0817325   181.16   0.000     14.64648    14.96687
--------------+----------------------------------------------------------------
      sigma_u |  .90698522
      sigma_e |   .2821005
          rho |  .91179291   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

The coefficients for nsources_mn and nsources_delta are equal to those estimated in the Between Effects and Fixed Effects models respectively.

Furthermore we can test whether the between and within effects are equal:

test nsources_mn = nsources_delta

 ( 1)  nsources_mn - nsources_delta = 0

           chi2(  1) =   28.78
         Prob > chi2 =    0.0000

An equivalent approach is to use the mundlak command:

mundlak linc orgage localc west genchar nsources govern_share, hybrid

The variable orgage does not vary sufficiently within groups and will not be use
> d to create additional regressors.
0% of the total variance in orgage is within groups.

The variable localc does not vary sufficiently within groups and will not be use
> d to create additional regressors.
0% of the total variance in localc is within groups.

The variable west does not vary sufficiently within groups and will not be used 
> to create additional regressors.
0% of the total variance in west is within groups.

The variable genchar does not vary sufficiently within groups and will not be us
> ed to create additional regressors.
0% of the total variance in genchar is within groups.

+------------------------------------------------+
|             Variable |     RE     |   Hybrid   |
|----------------------+------------+------------|
|               orgage |      0.005 |      0.005 |
|               localc |     -0.332 |     -0.329 |
|                 west |      0.080 |      0.097 |
|              genchar |     -0.273 |     -0.295 |
|             nsources |      0.030 |            |
|         govern_share |      0.001 |            |
|       diff__nsources |            |      0.028 |
|   diff__govern_share |            |      0.001 |
|       mean__nsources |            |      0.134 |
|   mean__govern_share |            |      0.000 |
|                _cons |     15.160 |     14.797 |
|----------------------+------------+------------|
|                    N |      23826 |      23826 |
|                  N_g |   2166.000 |   2166.000 |
|                g_min |     11.000 |     11.000 |
|                g_avg |     11.000 |     11.000 |
|                g_max |     11.000 |     11.000 |
|                  rho |      0.912 |      0.912 |
|                 rmse |      0.282 |      0.282 |
|                 chi2 |    507.117 |    537.048 |
|                    p |      0.000 |      0.000 |
|                 df_m |      6.000 |      8.000 |
|                sigma |      0.950 |      0.950 |
|              sigma_u |      0.907 |      0.907 |
|              sigma_e |      0.282 |      0.282 |
|                 r2_w |      0.014 |      0.014 |
|                 r2_o |      0.083 |      0.095 |
|                 r2_b |      0.089 |      0.102 |
+------------------------------------------------+

Mundlak approach¶

Random Effects model assumes that observed and unobserved effects are uncorrelated - an often unrealistic assumption (Gayle and Lambert, 2018).

We can relax this assumption using the Mundlak approach, which works by including unit-level means for the time-varying explanatory variables in the Random Effects model.

bys regno: egen orgage_mn = mean(orgage)
bys regno: egen govern_share_mn = mean(govern_share)

xtreg linc orgage localc west genchar nsources govern_share ///
    govern_share_mn nsources_mn orgage_mn, re
est store mund

Random-effects GLS regression                   Number of obs     =     23,826
Group variable: regno                           Number of groups  =      2,166

R-sq:                                           Obs per group:
     within  = 0.0140                                         min =         11
     between = 0.1042                                         avg =       11.0
     overall = 0.0976                                         max =         11

                                                Wald chi2(9)      =     557.99
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-------------------------------------------------------------------------------
         linc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       orgage |   .0069072   .0005802    11.90   0.000       .00577    .0080444
       localc |  -.3282906   .0414364    -7.92   0.000    -.4095044   -.2470768
         west |   .1167392   .0805284     1.45   0.147    -.0410935    .2745719
      genchar |  -.3210918   .0449127    -7.15   0.000    -.4091191   -.2330644
     nsources |   .0289886   .0027931    10.38   0.000     .0235142     .034463
 govern_share |   .0010325   .0001225     8.43   0.000     .0007923    .0012726
govern_shar~n |  -.0007575   .0007578    -1.00   0.317    -.0022428    .0007277
  nsources_mn |    .109471   .0195752     5.59   0.000     .0711044    .1478376
    orgage_mn |  -.0034024   .0007524    -4.52   0.000    -.0048772   -.0019277
        _cons |   14.85058   .0835091   177.83   0.000     14.68691    15.01426
--------------+----------------------------------------------------------------
      sigma_u |  .90700183
      sigma_e |   .2821005
          rho |  .91179586   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

quietly xtreg linc orgage localc west genchar nsources govern_share, fe
est store fixed

est table fixed mund

----------------------------------------
    Variable |   fixed         mund     
-------------+--------------------------
      orgage |   .0069072     .0069072  
      localc |  (omitted)    -.3282906  
        west |  (omitted)    .11673923  
     genchar |  (omitted)   -.32109178  
    nsources |  .02898861    .02898861  
govern_share |  .00103247    .00103247  
govern_sha~n |              -.00075753  
 nsources_mn |               .10947099  
   orgage_mn |              -.00340245  
       _cons |  14.715042    14.850581  
----------------------------------------

quietly xtreg linc orgage localc west genchar nsources govern_share ///
    govern_share_mn nsources_mn orgage_mn, re

test govern_share_mn = nsources_mn = orgage_mn

 ( 1)  govern_share_mn - nsources_mn = 0
 ( 2)  govern_share_mn - orgage_mn = 0

           chi2(  2) =   43.79
         Prob > chi2 =    0.0000

The Mundlak approach is an alternative to the Hausman test.

Dynamic panel models¶

The models are suitable for when you have repeated contacts data and your (lagged) outcome variable serves also serves as one of your explanatory variables.

The inclusion of lagged outcome variables poses as an issue as the lagged variables are possibly correlated with the unobserved effects (Gayle and Lambert, 2018).

use "./data/charity-panel-analysis-2020-09-10.dta", clear
xtset regno fin_year

(Contains annual accounts of charities in E&W for financial years 2006-2017)

       panel variable:  regno (strongly balanced)
        time variable:  fin_year, 1 to 11
                delta:  1 unit

capture gen linc_lag = L.linc
l regno fin_year linc linc_lag in 1/22, clean

        regno   fin_year       linc   linc_lag  
  200048    2006-07   14.00189          .  
  200048    2007-08   14.17788   14.00189  
  200048    2008-09    14.1851   14.17788  
  200048    2009-10    14.2326    14.1851  
  200048    2010-11    14.1709    14.2326  
  200048    2011-12   14.14801    14.1709  
  200048    2012-13     14.376   14.14801  
  200048    2013-14   14.29996     14.376  
  200048    2014-15   14.26031   14.29996  
  200048    2015-16   14.30113   14.26031  
  200048    2016-17   14.37021   14.30113  
  200051    2006-07     17.664          .  
  200051    2007-08   17.60568     17.664  
  200051    2008-09   17.44065   17.60568  
  200051    2009-10   16.46766   17.44065  
  200051    2010-11   16.32526   16.46766  
  200051    2011-12    16.4079   16.32526  
  200051    2012-13   16.35779    16.4079  
  200051    2013-14   16.04346   16.35779  
  200051    2014-15   15.71779   16.04346  
  200051    2015-16   15.42241   15.71779  
  200051    2016-17   15.51123   15.42241  

xtreg linc orgage localc west genchar nsources govern_share linc_lag, re

Random-effects GLS regression                   Number of obs     =     21,660
Group variable: regno                           Number of groups  =      2,166

R-sq:                                           Obs per group:
     within  = 0.2673                                         min =         10
     between = 0.9963                                         avg =       10.0
     overall = 0.9320                                         max =         10

                                                Wald chi2(7)      =  296585.19
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
        linc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      orgage |  -.0000262   .0000438    -0.60   0.549     -.000112    .0000596
      localc |  -.0065562   .0038027    -1.72   0.085    -.0140093    .0008968
        west |   .0009973    .007296     0.14   0.891    -.0133026    .0152973
     genchar |   -.005033   .0040519    -1.24   0.214    -.0129746    .0029087
    nsources |   .0102149   .0014779     6.91   0.000     .0073183    .0131116
govern_share |  -.0001568   .0000584    -2.69   0.007    -.0002712   -.0000424
    linc_lag |   .9719555    .001881   516.72   0.000     .9682687    .9756422
       _cons |   .4086979    .028964    14.11   0.000     .3519294    .4654663
-------------+----------------------------------------------------------------
     sigma_u |          0
     sigma_e |  .23554516
         rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Note how large the coefficient is for the lagged variable (and how much smaller the others have become). This is a common issue when including lagged outcome variables as one of the explanatory variables i.e., the lagged variable soaks up all of the variation accounted for by the unobserved unit-specific effects.

xtreg linc orgage localc west genchar nsources govern_share linc_lag, fe

note: localc omitted because of collinearity
note: west omitted because of collinearity
note: genchar omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     21,660
Group variable: regno                           Number of groups  =      2,166

R-sq:                                           Obs per group:
     within  = 0.2705                                         min =         10
     between = 0.9695                                         avg =       10.0
     overall = 0.9098                                         max =         10

                                                F(4,19490)        =    1807.17
corr(u_i, Xb)  = 0.9009                         Prob > F          =     0.0000

------------------------------------------------------------------------------
        linc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      orgage |   .0018829   .0005609     3.36   0.001     .0007836    .0029822
      localc |          0  (omitted)
        west |          0  (omitted)
     genchar |          0  (omitted)
    nsources |   .0211613   .0024664     8.58   0.000     .0163271    .0259956
govern_share |   .0005565   .0001088     5.12   0.000     .0003433    .0007698
    linc_lag |   .5167688   .0062088    83.23   0.000     .5045989    .5289386
       _cons |   7.147677   .0953486    74.96   0.000     6.960786    7.334568
-------------+----------------------------------------------------------------
     sigma_u |  .46351877
     sigma_e |  .23554516
         rho |  .79476462   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2165, 19490) = 3.29                 Prob > F = 0.0000

A set of dynamic panel models — commonly known as Arrelano-Bond models — have been developed to address the inclusion of a lagged outcome as an explanatory variable.

They also have the advantage of controlling for “initial conditions”.

That is, data collection sometimes interrupts an ongoing social process, and thus the outcome observed at the first time point is partially accounted for factors not measured at first time point (Gayle and Lambert, 2018).

Latent growth curve models¶

Statistical modelling of repeated contacts data.

Focuses on trajectory, trend or growth in an outcome over time within units.

And how these trajectories are linked to observed and unobserved differences between units.

Latent growth curve models can be estimated using a Multilevel modelling framework — random intercepts, random slopes.

They can also be estimated using a Structural Equation Modelling (SEM) framework — there exists underlying continuous trajectory of change that is not directly observed.

Honesty time

[Not an area I know a great deal about - see the reading list for suggested resources]