Panel Data Analysis III¶

In this section we estimate a statistical model that leverages some of the main advantages of using panel data: Fixed Effects. We show some examples of how to estimate and interpret this model, and reflect on the conditions under which the model is appropriate.

Quick reminder¶

Let’s briefly recap some essential concepts regarding panel data:

Two sources of variation (Gould, n.d.):

Cross-section information on differences between units
Time series information on differences over time within units

So far our panel data models — Pooled OLS and Between Effects — only allow us to examine differences between units.

Two main issues with estimating statistical models:

Interdependence of errors
Improper model specification

The first can lead to inefficient estimates: under-estimated standard errors and false positive tests of statistical significance.

The second to biased coefficients and incorrect inferences regarding magnitude and direction of effect of explanatory variables.

Therefore we need a statistical model that allows us to examine change over time and/or control for omitted variable bias.

Defining our statistical model¶

Before estimating Fixed Effects and Random Effects models separately, it is worth identifying the key commonality between their respective statistical models.

Let’s take a simplified version of our charity income statistical model, this time with only one explanatory variable (age) - typically it looks as follows:

\[ \text{y}_{it} = \beta_0 + \beta_1x_{1it} + \epsilon_{it} \tag{1.7} \]

However it is possible to decompose the residual variation (error term) into two separate terms:

\[ \text{y}_{it} = \beta_0 + \beta_1x_{1it} + \mu_{i} + \text{e}_{it} \tag{1.8} \]

In equation 1.8. we have introduced a unit-specific term to represent some of the residual variation in the outcome that is unexplained by the explanatory variables.

Decomposition implications¶

This term (\(\mu_{i}\)) captures the effect of residual heterogeneity on the outcome i.e., unobserved or immeasurable characteristics of the units that are associated with the outcome (and possibly the explanatory variables), and vary across units.

In our charity data example, these charity-specific effects could be organisational culture, informal connections to government etc. In theory these characteristics could be measured but it’s often wildly impractical.

The unit-specific effect also controls for the effect of other omitted variables on the outcome (and possibly the explanatory variables).

In our charity data example, we do not include explanatory variables capturing the amount a charity spends on fundraising, how well known it is etc.

A word of caution¶

Note the lack of a time subscript t in the new term \(\mu_{i}\). The implication is that the unobserved unit-specific effect is constant over time (i.e., within units).

Therefore Fixed Effects and Random Effects models only control for omitted variables that do not change within units (e.g., race, sex at birth, natural ability).

Fixed Effects Model¶

Conceptualising the Fixed Effects model¶

The Fixed Effects model focuses on how changes in explanatory variables are associated with changes in the outcome within units.
It assumes the observed explanatory variables and unobserved unit-specific effect are correlated (i.e., omitted variable bias is an issue).

Mehmetoglu and Jakobsen (2016, p. 241):

“In other words, we use fixed effects whenever we are only interested in the impact of variables that vary over time. This estimator helps us explore the relationship between the dependent and the explanatory variables within a unit (person, company, country, etc.) Each unit has its own individual characteristics that may or may not influence the predictor variables.”

The Fixed Effects model is specified as follows:

\[ \text{y}_{it} = \beta_0 + \lambda_{i} + \beta_1x_{1it} +...+ \beta_kx_{kit} + \text{e}_{it} \tag{1.9} \]

Where:

\(\lambda_{i}\) represents the unit-specific effect on the outcome.

The value of \(\lambda_{i}\) captures the effect of all of the unobserved time-invariant explanatory variables that are missing from the model. As a result, while the value of \(\lambda_{i}\) is calculated, it is not of much interest in and of itself. It’s main role is to allow for a more robust (i.e., unbiased) estimation of the effects of the explanatory variables in the model.

In essence the Fixed Effects model produces a unit-specific intercept, which is the sum of the overall constant and the unit-specific effect:

\[ \text{y}_{it} = \alpha_{i} + \beta_1x_{1it} +...+ \beta_kx_{kit} + \text{e}_{it} \tag{1.10} \]

Where:

\(\alpha_{i} = \beta_0 + \lambda_{i}\)

The unit-specific effect shifts the overall intercept up or down the y axis by the value of \(\lambda\).

Final thoughts on conceptualisation¶

Consider the Fixed Effects model a standard cross-sectional regression model with the addition of a dummy variable being for every unit in the panel except for one (i.e., n - 1 dummy variables are added as explanatory variables).

Estimation¶

use "./data/charity-panel-analysis-2020-09-10.dta", clear

(Contains annual accounts of charities in E&W for financial years 2006-2017)

xtreg linc orgage localc west genchar nsources govern_share, fe

note: localc omitted because of collinearity

note: west omitted because of collinearity

note: genchar omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     23,826

Group variable: regno                           Number of groups  =      2,166

R-sq:                                           Obs per group:

     within  = 0.0140                                         min =         11

     between = 0.0425                                         avg =       11.0

     overall = 0.0403                                         max =         11

                                                F(3,21657)        =     102.28

corr(u_i, Xb)  = -0.1002                        Prob > F          =     0.0000

------------------------------------------------------------------------------

        linc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]

-------------+----------------------------------------------------------------

      orgage |   .0069072   .0005802    11.90   0.000       .00577    .0080444

      localc |          0  (omitted)

        west |          0  (omitted)

     genchar |          0  (omitted)

    nsources |   .0289886   .0027931    10.38   0.000     .0235139    .0344633

govern_share |   .0010325   .0001225     8.43   0.000     .0007923    .0012727

       _cons |   14.71504    .026082   564.18   0.000     14.66392    14.76616

-------------+----------------------------------------------------------------

     sigma_u |  .94534636

     sigma_e |   .2821005

         rho |  .91823289   (fraction of variance due to u_i)

------------------------------------------------------------------------------

F test that all u_i=0: F(2165, 21657) = 120.80               Prob > F = 0.0000

QUESTION TIME

How much of the variation in the outcome is accounted for by the model? Is this a lot?
Why were three of the observed explanatory variables excluded in the estimation of the model?
What does the \(\text{rho}\) statistic tell us?
Is there evidence of correlation between the unit-specific effects and observed explanatory variables?

Interpretation¶

The effect of the observed explanatory variables is net of the effect of the unit-specific term. That is, we’ve controlled for the correlation between X and \(\mu_{i}\).

\(\text{_cons}\) is the intercept and represents the average value of the fixed effects + the overall constant.

\(\text{orgage}\) is the predicted change in the outcome for a one-unit increase in organisational age.

\(\text{rho}\) is the proportion of unexplained variance in the outcome explained by unobserved differences between charities (the unit-specific effects), rather than changes within them.

If \(\text{rho}\) > .5 then most of the residual variation in the outcome is due to differences between units, if \(\text{rho}\) < .5 then most of the residual variation is accounted for by differences within units (i.e., the effects of the explanatory variables).

\(\text{corr(u_i, Xb)}\) is the correlation between the unit-specific effect and the observed explanatory variables in the model.

\(\text{sigma_u}\) (or \(\sigma_u\)) is the standard deviation of the fixed effects (i.e., the residuals within units.
\(\text{sigma_e}\) (or \(\sigma_e\)) is the standard deviation of residuals ei.
\(\text{R-sq: within}\) is the proportion of variance explained by the observed explanatory variables (i.e., excluding the unit-specific effect).

Post-estimation¶

Though it’s very rarely of substantive interest, we can recover the unit-specific effects (and other parameter estimates) after estimating a Fixed Effects model:

capture predict fixed, u
capture predict y_hat, xb
capture predict ei, e
capture predict residuals, ue
capture egen pickone = tag(regno)

l regno fin_year fixed if pickone in 1/100

     +-------------------------------+

     |  regno   fin_year       fixed |

     |-------------------------------|

  1. | 200048    2006-07   -.9911593 |

 12. | 200051    2006-07    1.341765 |

 23. | 200069    2006-07   -.4608771 |

 34. | 200222    2006-07   -.1173254 |

 45. | 200424    2006-07   -.4077182 |

     |-------------------------------|

 56. | 200431    2006-07   -.5248324 |

 67. | 200500    2006-07   -.0236679 |

 78. | 201081    2006-07    .0432656 |

 89. | 201321    2006-07   -1.400211 |

100. | 201911    2006-07   -.7076412 |

     +-------------------------------+

l regno fin_year linc y_hat residuals fixed ei in 1/11

     +-----------------------------------------------------------------+

  1. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2006-07 | 14.00189 | 15.17772 | -1.175828 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                            -.1846687                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  2. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2007-08 | 14.17788 | 15.18462 | -1.006747 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                            -.0155877                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  3. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2008-09 |  14.1851 | 15.22075 |  -1.03565 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                            -.0444904                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  4. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2009-10 |  14.2326 | 15.19844 | -.9658405 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                             .0253188                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  5. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2010-11 |  14.1709 | 15.20695 | -1.036052 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                            -.0448926                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  6. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2011-12 | 14.14801 | 15.21225 |  -1.06424 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                            -.0730808                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  7. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2012-13 |   14.376 | 15.25097 | -.8749701 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                             .1161892                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  8. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2013-14 | 14.29996 | 15.22607 | -.9261075 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                             .0650518                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

  9. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2014-15 | 14.26031 |  15.2623 | -1.001989 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                            -.0108296                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

 10. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2015-16 | 14.30113 | 15.23988 | -.9387508 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                             .0524085                            |

     +-----------------------------------------------------------------+

     +-----------------------------------------------------------------+

 11. |  regno | fin_year |     linc |    y_hat | residuals |     fixed |

     | 200048 |  2016-17 | 14.37021 | 15.24679 | -.8765777 | -.9911593 |

     |-----------------------------------------------------------------|

     |                                   ei                            |

     |                             .1145816                            |

     +-----------------------------------------------------------------+

di -.9911593 + -.1846687

-1.175828

tabstat fixed ei, s(mean sd) format(%5.4f)

   stats |     fixed        ei

---------+--------------------

    mean |   -0.0000   -0.0000

      sd |    0.9451    0.2690

------------------------------

Benefits of Fixed Effects¶

Analyse change over time.
Control for residual heterogeneity.
Coefficient estimates are consistent if the key assumption is true. That is, because we have controlled for the effect of unobserved time-invariant explanatory variables, our coefficients are more robust, which means increasing the sample size increases the likelihood the estimates are converging on their true values.

(Mehmetoglu and Jakobsen, 2016)

Limitations of Fixed Effects¶

Ignores differences between units.
Coefficient estimates are inefficient, especially when compared to those from a Random Effects model. As a result, standard errors tend to be larger. Put simply, the estimates of the coefficients are based on only one source of variation (within) and thus are more uncertain.
Cannot include observed time-invariant explanatory variables. This is due to a very simple reason: if a value does not vary, how can it be associated with variation in the value of another variable?
Cannot control for unobserved residual heterogeneity that varies over time e.g., educational ability? Natural resilience?
It is not well suited for variables that rarely change within units.

Think carefully about variables that change little over time - how might these influence the outcome? For example, few individuals in your panel might switch from non-graduate to graduate (let’s say you have a sample of older individuals). In a fixed effects model, your estimation of the effect of switching between non-graduate and graduate will be based on a small number of occurrences and care is due in interpreting the coefficient.

Summarising the Fixed Effects model¶

Focuses on change over time within a unit of analysis.

Can control for the effect of unobserved time-invariant explanatory variables (residual heterogeneity).

Provides robust estimates of observed explanatory variables when said variables are correlated with unobserved effects.

However cannot include observed explanatory variables that do not vary within units.

Summary¶

Both the Pooled OLS and Between Effects models provide useful information on the association between an outcome Y and a set of explanatory variables X.

Fixed Effects provide potentially different information on the association between an outcome Y and a set of explanatory variables *X.

Is there a way to combine the within and between perspectives?

Longitudinal Data Analysis for Social Scientists