—from Epidemiological Bulletin, Vol. 24 No. 4, December 2003


A Glossary for Multilevel Analysis

Ana V. Diez Roux
Divisions of Medicine and Epidemiology, Columbia University
New York, New York, United States

PART II

EMPIRICAL BAYES ESTIMATES
Estimates of parameters for a given group or higher level unit (for example, estimates of group specific intercepts or slopes, such as b0j and b1j in equation (1), under multilevel models) obtained by combining information from the group itself with information from other similar groups investigated.(10, 19, 20) This is particularly useful when estimating parameters for a group with few within group observations. These estimates are “optimally” weighted averages that combine information derived from the group itself with the mean for all similar groups. The weighted average shifts the group specific estimate (derived using data only for that particular group) towards the mean for similar groups. The less precise the group specific estimate and the less the variability observed across groups, the greater the shift towards the overall group mean. Thus, the estimate for a given group is based not only on its own data but also takes into account estimates for other groups and the characteristics groups share.(20) Empirical Bayes estimates of parameters for a given group can be derived from multilevel models using estimates of the group level errors (for example, U0j and U1j , see multilevel models) for that particular group. Empirical Bayes estimates are also sometimes referred to as “shrinkage estimates” because they “shrink” the group specific estimate towards the overall mean (although in fact when the overall mean is greater than the group specific estimate, the “shrunken” or empirical Bayes estimate may actually be greater than the group specific estimate). In public health, empirical Bayes estimation can be used, for example, to derive improved estimates of rates of death or diseases for small areas with few observations, (21) or to estimate rates of different health outcomes for individual providers (hospitals, physicians, etc.) (22) In other applications (which do not involve the structure of individuals within groups although they are analogous to it), empirical Bayes estimates of regression coefficients have been used to obtain improved estimates of associations in studies investigating the role of multiple exposures.(23)

ENVIRONMENTAL VARIABLES
In the context of ecological studies and multilevel analysis, the term “environmental variables” has sometimes been used to refer to group level measures of physical or chemical exposures. Environmental variables, so defined, have been proposed as a “type” of group level variable, distinct from derived variables and integral variables.(11) These variables are not derived by aggregating the characteristics of individuals but they do have group level and individual level analogues (for example, days of sunlight in the community and individual level sunlight exposure information). In contrast with derived and integral variables, which may be used as indicators of group level constructs, group level environmental variables are used exclusively as proxies for individual level exposures (which may be more difficult to measure for logistic or methodological reasons), rather than as indicators of a group level property, which is conceptually different from the analogous measure at the individual level.

FIXED EFFECTS / FIXED COEFFICIENTS
Regression coefficients (intercepts or covariate effects) that are not allowed to vary randomly across higher level units (see multilevel models). For example, in the case of persons nested within neighborhoods, two options are available for modelling the effects of neighborhood. One option is to include a dummy variable for each neighborhood. In this case the neighborhood coefficients are modelled as fixed (sometimes called “fixed effects”). Another option is to assume that the neighborhoods in the sample are a random sample of a larger population of neighborhoods and that the coefficients for the “neighborhood effect” vary randomly around an overall mean (for example, as reflected by Uoj in equation 2 under the entry for multilevel models). In this case, the neighborhood effects are modelled as random (sometimes called “random effects”, see random effects models). In the same example, the coefficients for individual level covariates can also be modelled as fixed or random. For example, if the relation between individual level income and blood pressure is not allowed to vary randomly across neighborhoods, the coefficient for individual level income is fixed (“fixed coefficient”). On the other hand, if the coefficient for individual level income is allowed to vary randomly across neighborhoods around an overall mean effect (as reflected by U1j in equation 3 under the entry for multilevel models), the coefficient for income is modelled as random (sometimes called a “random coefficient”, see random coefficient models). Although the terms “fixed effects” and “fixed coefficients “ are sometimes distinguished as noted above, they are often used interchangeably. Fixed effects models or fixed coefficient models are models in which all effects or coefficients are fixed. See also random effects/random coefficients.

GROUP LEVEL VARIABLES
Term used to refer to variables that characterize groups. The terms group level variables, macro variables and ecological variables are often used interchangeably. (2, 6, 11, 14, 24) Group level variables may be used as proxies for unavailable or unreliable individual level data (for example, when neighborhood mean income is used as a proxy for the individual level income of individuals living in the neighborhood) or as indicators of group level constructs (for example, when mean neighborhood income is used as an indicator of neighborhood characteristics that may be related to individual level outcomes independently of individual level income). It is the second usage (as indicators of group level constructs) that is of particular interest in multilevel analysis. Group level variables have been classified into two basic types,(11, 13, 24) derived variables and integral variables. Two additional types of group level variables, structural variables (13) and environmental variables (11) are sometimes distinguished. The term contextual variables has been used as a synonym for group level variables generally (6, 13) although it is sometimes reserved for derived group level variables.(11, 14)

HIERARCHICAL (LINEAR) MODELS
See multilevel models

INDIVIDUAL LEVEL VARIABLES
Term used to refer to variables that characterize individuals and refer to individual level constructs (for example, age or personal income).

INDIVIDUALISTIC FALLACY
Term used as a synonym for the atomistic fallacy. May sometimes also be used as a synonym for the psychologistic fallacy.

INTEGRAL VARIABLES
A type of group level variable. Integral variables differ from derived variables (another type of group level variable) in that they are not summaries of the characteristics of individuals in the group. Integral variables have no individual level analogues and necessarily refer to group level constructs. Examples of integral variables include the existence of certain types of laws, political or economic system, social disorganization, or population density.(11, 13) Integral variables have also been referred to as primary or global variables.

INTRACLASS CORRELATION
A measure of the degree of resemblance between lower level units belonging to the same higher level unit or cluster.(25) In the case of individuals nested within groups (for example, neighborhoods), the intraclass correlation measures the extent to which values of the dependent variable are similar for individuals belonging to the same group. It can be thought of as the average correlation between values of two randomly drawn lower level units (for example, individuals) in the same, randomly drawn higher level unit (for example, neighborhood). It can also be defined as the proportion of the variance in the outcome that is between the groups or higher level units. In the case of a simple random intercept model, the intraclass correlation coefficient is estimated by the ratio of population variance between groups (00) to the total variance (00 + ^2).(25) (see multilevel models). The estimation of the intraclass correlation coefficient in models including random covariate effects, or in the case of non-normally distributed dependent variables, is more complex and not always straightforward .

MARGINAL MODELS
See population-average models.

MIXED MODELS
Term used to refer to models that contain a mixture of fixed effects (or fixed coefficients) and random effects (or random coefficients). In mixed models some of the regression coefficients (intercepts or covariate effects) are allowed to vary randomly across higher level units but others are not (see multilevel models). Thus mixed models can be thought of as a particular case of the more general multilevel models (although the term is also occasionally used as a synonym of multilevel models generally). Sometimes the term mixed models is also used to encompass models that account for correlation between lower level units (for example, individuals) within higher level units (for example, neighborhoods) in other ways—that is, by modelling the correlations or covariances themselves rather than by allowing for random effects or random coefficients.(26) These models (which are not multilevel models) have also been called covariance pattern models,(26) marginal models, or population average models.

MULTILEVEL ANALYSIS
An analytical approach that is appropriate for data with nested sources of variability—that is, involving units at a lower level or micro units (for example, individuals) nested within units at a higher level or macro units (for example, groups such as schools or neighborhoods).(5, 10, 19, 24, 25, 27–30) Multilevel analysis allows the simultaneous examination of the effects of group level and individual level variables on individual level outcomes while accounting for the non-independence of observations within groups. Multilevel analysis also allows the examination of both between group and within group variability as well as how group level and individual level variables are related to variability at both levels. Thus, multilevel models can be used to draw inferences regarding the causes of inter-individual variation (or the relation of group and individual level variables to individual level outcomes) but inferences can also be made regarding inter-group variation, whether it exists in the data, and to what extent it is accounted for by group and individual level characteristics. In multilevel analysis, groups or contexts are not treated as unrelated but are conceived as coming from a larger population of groups about which inferences want to be made. Multilevel analysis thus allows researchers to deal with the micro-level of individuals and the macro-level of groups or contexts simultaneously.(5)

Multilevel analysis has a broad range of applications in many situations involving nested sources of random variability such as persons nested within neighborhoods,(5, 30) patients nested within providers,31 meta analysis (observations nested within sites),(19, 32) longitudinal data analysis (repeat measurements over time nested within persons),(28, 33, 34) multivariate responses (multiple outcomes nested within individuals),(5) the analysis of repeat cross sectional surveys (multiple observations nested within time periods),(35) the examination of geographical variations in rates (rates for smaller areas nested within regions or larger areas)36 and the examination of interviewer effects (respondents nested within interviewers).(37) Multilevel analysis can also be used in situations involving multiple nested contexts19, 28 (for example, multiple measures over time on individuals nested within neighborhoods) as well as overlapping or cross classified contexts (for example, children nested within neighborhoods and schools).(38) The statistical models used in multilevel analysis are referred to as multilevel models (25, 28, 29) or hierarchical linear models.(19, 39)

MULTILEVEL MODELS
The statistical models used in multilevel analysis.(19, 25, 28, 29) The terms “hierarchical models” and “multilevel models” are often used synonymously. These models (or variants of them) have previously appeared in different literatures under a variety of names including random effects models or random coefficient models (40–42) “covariance components models” or “variance components models”,(43, 44) and mixed models.(26) A simplified example for the case of a normally distributed dependent variable, a single individual level (lower level unit) predictor and a single group level (higher level unit) predictor is provided below. Analogous models can be formulated for non-normally distributed dependent variables.(10, 28, 39, 45)

In the case of multilevel analysis involving two levels (for example, individuals nested within groups), the multilevel model can be conceptualized as a two stage system of equations.

In the first stage (level 1), a separate individual level regression is defined for each group or higher level unit.
(1) Yij = b0j + b1j Iij +ij
ij ~ N (0, ^2) where
Yij = outcome variable for ith individual in jth group
Iij= individual level variable for ith individual in jth group
b0j is the group specific intercept
b1j is the group specific effect of the individual level variable

Individual level errors (eij) are assumed to be independent and identically distributed with a mean of 0 and a variance of ^2. The same regressors are generally used in all groups, but regression coefficients (b0j and b1j) allowed to vary from one group to another.
In a second stage (level 2), each of the group or context specific regression coefficients defined in equation (1) (b0j and b1j in this example) are modelled as a function of group level (or higher level) variables.
2) b0j = 00 + 01Gj + U0j
U0j ~ N (0, 00)

(3) b1j = 10 + 11Gj + U1j
U1j ~ N (0, 11)
cov (U0j, U10) = 10
Gj group level variable
00 is the common intercept across groups
01 is the effect of the group level predictor on the group specific intercepts
10 is the common slope associated with the individual level variable across groups
11 is the effect of the group level predictor on the group specific slopes

The errors in the level 2 equations (U0j and U1j), sometimes called “macro errors”, are assumed to be normally distributed with mean 0 and variances 00 and 11 respectively. 01 represents the covariance between intercepts and slopes. Thus, multilevel analysis summarizes the distribution of the group specific coefficients in terms of two parts: a “fixed” part that is common across groups (00 and 01 for the intercept, and 10 and 11 for the slope) and a “random” part (U0j for the intercept and U1j for the slope) that is allowed to vary from group to group (see also fixed coefficients and random coefficients).

By including an error term in the group level equations (equations (2) and (3)), these models allow for sampling variability in the group specific coefficients (b0j and b1j) and also for the fact that the group level equations are not deterministic (that is, the possibility that not all relevant macro-level variables have been included in the model). The underlying assumption is that group specific intercepts and slopes are random samples from a normally distributed population of group specific intercepts and slopes, or alternatively, that the macro errors are exchangeable—that is, that the residual variation in group specific coefficients across groups is unsystematic.(10)

An alternative way to present the model fitted in multilevel analysis is to substitute equations (2) and (3) in (1) to obtain:
Yij = 00 + 01Cj + 10Iij + 11CjIij + U0j + U1jIij +ij

The model includes the effects of group level variables (01), individual level variables (10) and their interaction (11) on the individual level outcome Yij. These coefficients (01, 10 and 11), which are common to all individuals regardless of the group to which they belong, are often called the fixed coefficients (or fixed effects). The model also includes a random intercept component (U0j), and a random slope component (U1j). The values of these components vary randomly across groups, and hence U0j and U1j referred to as the random coefficients (or random effects). The parameters of the above equations (fixed effects, random effects, variances of the random effects, and residual variance) are simultaneously estimated using iterative methods. The level 1 and level 2 variances (^2, 00, 11 y 10) are called the (co)variance components.

Many variants of the more general model illustrated above are possible. For example, only group specific intercepts (b0j) may be modelled as random (these models have also been called random effects models). When covariate effects (b1j in the example above) are modelled as random these models have also been called random coefficient models. When some of the coefficients are fixed and others are random, these models have also been called “mixed effects models” or simply mixed models. When all coefficients are modelled as fixed (no random errors are included in level 2 equations), these models are reduced to traditional contextual effects models. Multilevel models can also account for multiple nested contexts (or levels) (19, 28) allowing fixed and random coefficients to be associated with variables measured at different levels of the data hierarchy being analyzed. Multilevel models can also be modified to allow for non-hierarchical, overlapping or cross classified contexts (for example, children simultaneously nested within neighborhoods and schools).(38)

References:
NOTE: References 1-18 were included in Part I of the Glossary, in Vol. 24, No. 3 (2003) of the Epidemiological Bulletin.
(19) Bryk AS, Raudenbush SW. Hierarchichal linear models: applications and data analysis methods.Newbury Park: Sage, 1992.
(20) Rice N, Jones A. Multilevel models and health economics. Health Econ 1997;6:561–75.
(21) Clayton D, Kaldor J. Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 1987;43:671–81.
(22) Thomas N, Lonford N, Rolph J. Empirical Bayes methods for estimating hosptial-specific morality rates. Stat Med1994;13:889–903.
(23) Witte JS, Greenland S, Haile RW, et al. Hierarchical regression analysis applied to a study of multiple dietary exposures and breast cancer. Epidemiology 1994;5:612–21.
(24) Von Korff M, Koepsell T, Curry S, et al. Multi-level research in epidemiologic research on health behaviors and outcomes. Am J Epidemiol 1992;135:1077–82.
(25) Snijders TAB, Bosker RJ. Multilevel analysis: an introduction to basic and advanced multilevel modeling. London: Sage, 1999.
(26) Brown H, Prescott R . Applied mixed models in medicine. New York: Wiley, 2000.
(27) Mason W, Wong G, Entwisle B. Contextual analysis through the multilevel linear model. In: Leinhardt S, ed. Sociological methodology. San Francisco: Josey Bass, 1983–1984: 72–103.
(28) Goldstein H. Multilevel statistical models. New York: Halsted Press, 1995.
(29) Kreft I, deLeeuw J. Introducing multilevel modeling. London: Sage, 1998.
(30) Diez-Roux AV. Multilevel analysis in public health research. Annu Rev Public Health 2000;21:171–92.
(31) Sixma HJ, Spreeuwenberg PM, Pasch MAvd. Patient satisfaction with the general practitioner: a two-level analysis. Med Care 1998;36:212–29.
(32) Hedeker D, Gibbons R, Davis J. Random regression models for multicenter clinical trials data. Psychopharmacol Bull1991;27:73–7.
(33) Rutter C, Elashoff R. Analysis of longitudinal data: random coefficient regression modelling. Stat Med1994;13:1211–31.
(34) Cnaan A, Laird NM, Slasor P. Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Stat Med 1997;16:2349–80.
(35) DiPrete T, Grusky D. The multi-level analysis of trends with repeated cross-sectional data. Sociol Methodol 1990;20:337–68.
(36) Langford I, Bentham G, McDonald A. Multi-level modelling of geographically aggregated health data: a case study of malignant melanoma mortality and uv exposure in the European community. Stat Med1998;17:41–57.
(37) Hox JP, de Leeuw ED, Kreft IGG. The effect of interviewer and respondent characteristics on the quality of survey data: a multilevel model. In: Biemer PP, Lyberg LE, Mathiowetz NA, et al, eds. Measurement errors in surveys. New York: Wiley, 1991.
(38) Goldstein H. Multilevel cross-classified models. Sociol Methods Res 1994;22:364–75.

Source: Published initially as “A glossary for multilevel analysis” in the Journal of Epidemiology and Community Health, 56:588-594, 2002.



Return to Index
Epidemiological Bulletin, Vol. 24 No. 4, December 2003