from Epidemiological Bulletin,
Vol. 24 No. 4, December 2003
A Glossary for Multilevel Analysis
Ana V. Diez Roux
Divisions of Medicine and Epidemiology, Columbia University
New York, New York, United States
PART II
EMPIRICAL BAYES ESTIMATES
Estimates of parameters for a given group or higher level unit (for example,
estimates of group specific intercepts or slopes, such as b0j
and b1j in equation (1), under multilevel models) obtained
by combining information from the group itself with information from other similar
groups investigated.(10, 19, 20) This is particularly useful when estimating
parameters for a group with few within group observations. These estimates are
optimally weighted averages that combine information derived from
the group itself with the mean for all similar groups. The weighted average
shifts the group specific estimate (derived using data only for that particular
group) towards the mean for similar groups. The less precise the group specific
estimate and the less the variability observed across groups, the greater the
shift towards the overall group mean. Thus, the estimate for a given group is
based not only on its own data but also takes into account estimates for other
groups and the characteristics groups share.(20) Empirical Bayes estimates of
parameters for a given group can be derived from multilevel models using estimates
of the group level errors (for example, U0j and U1j
, see multilevel models) for that particular group. Empirical Bayes estimates
are also sometimes referred to as shrinkage estimates because they
shrink the group specific estimate towards the overall mean (although
in fact when the overall mean is greater than the group specific estimate, the
shrunken or empirical Bayes estimate may actually be greater than
the group specific estimate). In public health, empirical Bayes estimation can
be used, for example, to derive improved estimates of rates of death or diseases
for small areas with few observations, (21) or to estimate rates of different
health outcomes for individual providers (hospitals, physicians, etc.) (22)
In other applications (which do not involve the structure of individuals within
groups although they are analogous to it), empirical Bayes estimates of regression
coefficients have been used to obtain improved estimates of associations in
studies investigating the role of multiple exposures.(23)
ENVIRONMENTAL VARIABLES
In the context of ecological studies and multilevel analysis, the term environmental
variables has sometimes been used to refer to group level measures of
physical or chemical exposures. Environmental variables, so defined, have been
proposed as a type of group level variable,
distinct from derived variables and integral variables.(11)
These variables are not derived by aggregating the characteristics of individuals
but they do have group level and individual level analogues (for example, days
of sunlight in the community and individual level sunlight exposure information).
In contrast with derived and integral variables, which may be used as indicators
of group level constructs, group level environmental variables are used exclusively
as proxies for individual level exposures (which may be more difficult to measure
for logistic or methodological reasons), rather than as indicators of a group
level property, which is conceptually different from the analogous measure at
the individual level.
FIXED EFFECTS / FIXED COEFFICIENTS
Regression coefficients (intercepts or covariate effects) that are not allowed
to vary randomly across higher level units (see multilevel models).
For example, in the case of persons nested within neighborhoods, two options
are available for modelling the effects of neighborhood. One option is to include
a dummy variable for each neighborhood. In this case the neighborhood coefficients
are modelled as fixed (sometimes called fixed effects). Another
option is to assume that the neighborhoods in the sample are a random sample
of a larger population of neighborhoods and that the coefficients for the neighborhood
effect vary randomly around an overall mean (for example, as reflected
by Uoj in equation 2 under the entry for multilevel models).
In this case, the neighborhood effects are modelled as random (sometimes called
random effects, see random effects models). In the same example,
the coefficients for individual level covariates can also be modelled as fixed
or random. For example, if the relation between individual level income and
blood pressure is not allowed to vary randomly across neighborhoods, the coefficient
for individual level income is fixed (fixed coefficient). On the
other hand, if the coefficient for individual level income is allowed to vary
randomly across neighborhoods around an overall mean effect (as reflected by
U1j in equation 3 under the entry for multilevel models),
the coefficient for income is modelled as random (sometimes called a random
coefficient, see random coefficient models). Although the terms fixed
effects and fixed coefficients are sometimes distinguished
as noted above, they are often used interchangeably. Fixed effects models or
fixed coefficient models are models in which all effects or coefficients are
fixed. See also random effects/random coefficients.
GROUP LEVEL VARIABLES
Term used to refer to variables that characterize groups. The terms group level
variables, macro variables and ecological variables are often used interchangeably.
(2, 6, 11, 14, 24) Group level variables may be used as proxies for unavailable
or unreliable individual level data (for example, when neighborhood mean income
is used as a proxy for the individual level income of individuals living in
the neighborhood) or as indicators of group level constructs (for example, when
mean neighborhood income is used as an indicator of neighborhood characteristics
that may be related to individual level outcomes independently of individual
level income). It is the second usage (as indicators of group level constructs)
that is of particular interest in multilevel analysis. Group level variables
have been classified into two basic types,(11, 13, 24) derived
variables and integral variables. Two additional
types of group level variables, structural variables (13) and environmental
variables (11) are sometimes distinguished. The term contextual variables
has been used as a synonym for group level variables generally (6, 13) although
it is sometimes reserved for derived group level variables.(11, 14)
HIERARCHICAL (LINEAR) MODELS
See multilevel models
INDIVIDUAL LEVEL VARIABLES
Term used to refer to variables that characterize individuals and refer
to individual level constructs (for example, age or personal income).
INDIVIDUALISTIC FALLACY
Term used as a synonym for the atomistic fallacy. May sometimes also
be used as a synonym for the psychologistic fallacy.
INTEGRAL VARIABLES
A type of group level variable. Integral variables
differ from derived variables (another type of group level variable) in that
they are not summaries of the characteristics of individuals in the group. Integral
variables have no individual level analogues and necessarily refer to group
level constructs. Examples of integral variables include the existence of certain
types of laws, political or economic system, social disorganization, or population
density.(11, 13) Integral variables have also been referred to as primary or
global variables.
INTRACLASS CORRELATION
A measure of the degree of resemblance between lower level units belonging
to the same higher level unit or cluster.(25) In the case of individuals nested
within groups (for example, neighborhoods), the intraclass correlation measures
the extent to which values of the dependent variable are similar for individuals
belonging to the same group. It can be thought of as the average correlation
between values of two randomly drawn lower level units (for example, individuals)
in the same, randomly drawn higher level unit (for example, neighborhood). It
can also be defined as the proportion of the variance in the outcome that is
between the groups or higher level units. In the case of a simple random intercept
model, the intraclass correlation coefficient is estimated by the ratio of population
variance between groups (
00)
to the total variance (
00
+
^2).(25) (see multilevel
models). The estimation of the intraclass correlation coefficient in
models including random covariate effects, or in the case of non-normally distributed
dependent variables, is more complex and not always straightforward .
MARGINAL MODELS
See population-average models.
MIXED MODELS
Term used to refer to models that contain a mixture of fixed
effects (or fixed coefficients) and random effects (or random
coefficients). In mixed models some of the regression coefficients (intercepts
or covariate effects) are allowed to vary randomly across higher level units
but others are not (see multilevel models). Thus mixed models can be
thought of as a particular case of the more general multilevel models (although
the term is also occasionally used as a synonym of multilevel models generally).
Sometimes the term mixed models is also used to encompass models that account
for correlation between lower level units (for example, individuals) within
higher level units (for example, neighborhoods) in other waysthat is,
by modelling the correlations or covariances themselves rather than by allowing
for random effects or random coefficients.(26) These models (which are not multilevel
models) have also been called covariance pattern models,(26) marginal models,
or population average models.
MULTILEVEL ANALYSIS
An analytical approach that is appropriate for data with nested sources of variabilitythat
is, involving units at a lower level or micro units (for example, individuals)
nested within units at a higher level or macro units (for example, groups such
as schools or neighborhoods).(5, 10, 19, 24, 25, 2730) Multilevel analysis
allows the simultaneous examination of the effects of group level and individual
level variables on individual level outcomes while accounting for the non-independence
of observations within groups. Multilevel analysis also allows the examination
of both between group and within group variability as well as how group level
and individual level variables are related to variability at both levels. Thus,
multilevel models can be used to draw inferences regarding the causes of inter-individual
variation (or the relation of group and individual level variables to individual
level outcomes) but inferences can also be made regarding inter-group variation,
whether it exists in the data, and to what extent it is accounted for by group
and individual level characteristics. In multilevel analysis, groups or contexts
are not treated as unrelated but are conceived as coming from a larger population
of groups about which inferences want to be made. Multilevel analysis thus allows
researchers to deal with the micro-level of individuals and the macro-level
of groups or contexts simultaneously.(5)
Multilevel analysis has a broad range of applications in many
situations involving nested sources of random variability such as persons nested
within neighborhoods,(5, 30) patients nested within providers,31 meta analysis
(observations nested within sites),(19, 32) longitudinal data analysis (repeat
measurements over time nested within persons),(28, 33, 34) multivariate responses
(multiple outcomes nested within individuals),(5) the analysis of repeat cross
sectional surveys (multiple observations nested within time periods),(35) the
examination of geographical variations in rates (rates for smaller areas nested
within regions or larger areas)36 and the examination of interviewer effects
(respondents nested within interviewers).(37) Multilevel analysis can also be
used in situations involving multiple nested contexts19, 28 (for example, multiple
measures over time on individuals nested within neighborhoods) as well as overlapping
or cross classified contexts (for example, children nested within neighborhoods
and schools).(38) The statistical models used in multilevel analysis are referred
to as multilevel models (25, 28, 29) or hierarchical linear models.(19,
39)
MULTILEVEL MODELS
The statistical models used in multilevel analysis.(19,
25, 28, 29) The terms hierarchical models and multilevel models
are often used synonymously. These models (or variants of them) have previously
appeared in different literatures under a variety of names including random
effects models or random coefficient models (4042) covariance
components models or variance components models,(43, 44) and
mixed models.(26) A simplified example for the case of a normally distributed
dependent variable, a single individual level (lower level unit) predictor and
a single group level (higher level unit) predictor is provided below. Analogous
models can be formulated for non-normally distributed dependent variables.(10,
28, 39, 45)
In the case of multilevel analysis involving two levels (for example,
individuals nested within groups), the multilevel model can be conceptualized
as a two stage system of equations.
In the first stage (level 1), a separate individual level regression
is defined for each group or higher level unit.
(1) Yij = b0j + b1j
Iij +
ij
ij
~ N (0,
^2) where
Yij = outcome variable for ith individual in jth group
Iij= individual level variable for ith individual in
jth group
b0j is the group specific intercept
b1j is the group specific effect of the individual level
variable
Individual level errors (eij) are assumed
to be independent and identically distributed with a mean of 0 and a variance
of
^2. The same regressors
are generally used in all groups, but regression coefficients (b0j and b1j)
allowed to vary from one group to another.
In a second stage (level 2), each of the group or context specific regression
coefficients defined in equation (1) (b0j and b1j
in this example) are modelled as a function of group level (or higher level)
variables.
2) b0j =
00
+
01Gj
+ U0j
U0j ~ N (0,
00)
(3) b1j =
10
+
11Gj
+ U1j
U1j ~ N (0,
11)
cov (U0j, U10) =
10
Gj group level variable
00
is the common intercept across groups
01
is the effect of the group level predictor on the group specific intercepts
10
is the common slope associated with the individual level variable across groups
11
is the effect of the group level predictor on the group specific slopes
The errors in the level 2 equations (U0j
and U1j), sometimes called macro errors,
are assumed to be normally distributed with mean 0 and variances
00
and
11
respectively.
01
represents the covariance between intercepts and slopes. Thus, multilevel analysis
summarizes the distribution of the group specific coefficients in terms of two
parts: a fixed part that is common across groups (
00
and
01
for the intercept, and
10
and
11
for the slope) and a random part (U0j for
the intercept and U1j for the slope) that is allowed
to vary from group to group (see also fixed coefficients and random coefficients).
By including an error term in the group level equations (equations
(2) and (3)), these models allow for sampling variability in the group specific
coefficients (b0j and b1j) and
also for the fact that the group level equations are not deterministic (that
is, the possibility that not all relevant macro-level variables have been included
in the model). The underlying assumption is that group specific intercepts and
slopes are random samples from a normally distributed population of group specific
intercepts and slopes, or alternatively, that the macro errors are exchangeablethat
is, that the residual variation in group specific coefficients across groups
is unsystematic.(10)
An alternative way to present the model fitted in multilevel analysis
is to substitute equations (2) and (3) in (1) to obtain:
Yij =
00
+
01Cj
+
10Iij
+
11CjIij
+ U0j + U1jIij
+
ij
The model includes the effects of group level variables (01),
individual level variables (10) and their interaction (11) on the individual
level outcome Yij. These coefficients (
01,
10
and
11),
which are common to all individuals regardless of the group to which they belong,
are often called the fixed coefficients (or fixed effects). The model also includes
a random intercept component (U0j), and a random slope
component (U1j). The values of these components vary
randomly across groups, and hence U0j and U1j
referred to as the random coefficients (or random effects). The parameters of
the above equations (fixed effects, random effects, variances of the random
effects, and residual variance) are simultaneously estimated using iterative
methods. The level 1 and level 2 variances (
^2,
00,
11
y
10)
are called the (co)variance components.
Many variants of the more general model illustrated above are
possible. For example, only group specific intercepts (b0j)
may be modelled as random (these models have also been called random effects
models). When covariate effects (b1j in the example
above) are modelled as random these models have also been called random coefficient
models. When some of the coefficients are fixed and others are random, these
models have also been called mixed effects models or simply mixed
models. When all coefficients are modelled as fixed (no random errors
are included in level 2 equations), these models are reduced to traditional
contextual effects models.
Multilevel models can also account for multiple nested contexts (or levels)
(19, 28) allowing fixed and random coefficients to be associated with variables
measured at different levels of the data hierarchy being analyzed. Multilevel
models can also be modified to allow for non-hierarchical, overlapping or cross
classified contexts (for example, children simultaneously nested within neighborhoods
and schools).(38)
References:
NOTE: References 1-18 were included in Part
I of the Glossary, in Vol. 24, No. 3 (2003) of the Epidemiological Bulletin.
(19) Bryk AS, Raudenbush SW. Hierarchichal linear models: applications and data
analysis methods.Newbury Park: Sage, 1992.
(20) Rice N, Jones A. Multilevel models and health economics. Health Econ 1997;6:56175.
(21) Clayton D, Kaldor J. Empirical Bayes estimates of age-standardized relative
risks for use in disease mapping. Biometrics 1987;43:67181.
(22) Thomas N, Lonford N, Rolph J. Empirical Bayes methods for estimating hosptial-specific
morality rates. Stat Med1994;13:889903.
(23) Witte JS, Greenland S, Haile RW, et al. Hierarchical regression analysis
applied to a study of multiple dietary exposures and breast cancer. Epidemiology
1994;5:61221.
(24) Von Korff M, Koepsell T, Curry S, et al. Multi-level research in epidemiologic
research on health behaviors and outcomes. Am J Epidemiol 1992;135:107782.
(25) Snijders TAB, Bosker RJ. Multilevel analysis: an introduction to basic
and advanced multilevel modeling. London: Sage, 1999.
(26) Brown H, Prescott R . Applied mixed models in medicine. New York: Wiley,
2000.
(27) Mason W, Wong G, Entwisle B. Contextual analysis through the multilevel
linear model. In: Leinhardt S, ed. Sociological methodology. San Francisco:
Josey Bass, 19831984: 72103.
(28) Goldstein H. Multilevel statistical models. New York: Halsted Press, 1995.
(29) Kreft I, deLeeuw J. Introducing multilevel modeling. London: Sage, 1998.
(30) Diez-Roux AV. Multilevel analysis in public health research. Annu Rev Public
Health 2000;21:17192.
(31) Sixma HJ, Spreeuwenberg PM, Pasch MAvd. Patient satisfaction with the general
practitioner: a two-level analysis. Med Care 1998;36:21229.
(32) Hedeker D, Gibbons R, Davis J. Random regression models for multicenter
clinical trials data. Psychopharmacol Bull1991;27:737.
(33) Rutter C, Elashoff R. Analysis of longitudinal data: random coefficient
regression modelling. Stat Med1994;13:121131.
(34) Cnaan A, Laird NM, Slasor P. Using the general linear mixed model to analyse
unbalanced repeated measures and longitudinal data. Stat Med 1997;16:234980.
(35) DiPrete T, Grusky D. The multi-level analysis of trends with repeated cross-sectional
data. Sociol Methodol 1990;20:33768.
(36) Langford I, Bentham G, McDonald A. Multi-level modelling of geographically
aggregated health data: a case study of malignant melanoma mortality and uv
exposure in the European community. Stat Med1998;17:4157.
(37) Hox JP, de Leeuw ED, Kreft IGG. The effect of interviewer and respondent
characteristics on the quality of survey data: a multilevel model. In: Biemer
PP, Lyberg LE, Mathiowetz NA, et al, eds. Measurement errors in surveys. New
York: Wiley, 1991.
(38) Goldstein H. Multilevel cross-classified models. Sociol Methods Res 1994;22:36475.
Source: Published initially as A glossary for multilevel
analysis in the Journal of Epidemiology and Community Health, 56:588-594,
2002.
Return to Index
Epidemiological Bulletin, Vol. 24 No. 4, December
2003