Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Journal of Educational and Behavioral Statistics
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Thomas, D. R.
Right arrow Articles by Decady, Y. J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Articles

Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear Regression

D. Roland Thomas and PengCheng Zhu

Carleton University

Yves J. Decady

Statistics Canada


    Abstract
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an example is provided.

Key Words: Keywords: relative importance • normalized Pratt measures • asymptotic variances • Bonferroni intervals • simultaneous confidence intervals

The purpose of this article is to review and to present some new results on a particular measure of relative importance for multiple linear regression for the general case of p correlated explanatory variables. This measure defines the importance of the j-th independent variable as the product of two terms, namely its simple correlation with the dependent variable, {rho}j, and its standardized regression coefficient in the multiple regression, betaj. The relative importance of two variables, xj and xk, is then defined as the ratio of their importance measures: namely, betaj{rho}j/betak{rho}k. The measure itself is far from new. For example, it was considered (and criticized) by Darlington (1968), and it has been discussed more recently by Bring (1996), who called it the ‘‘product measure,’’ and by Thomas, Hughes, and Zumbo (1998), who responded to some of Bring’s criticisms of it. The justification for this measure had always been heuristic, at best, until Pratt (1987) provided a theoretical derivation based on a set of natural assumptions or axioms that should be satisfied by a measure of variable importance. In light of this justification of what had hitherto been a controversial measure (see Pratt, 1987), the quantity betaj{rho}j, corresponding to an individual explanatory variable, xj, will be referred to in this article as ‘‘Pratt’s measure’’ of variable importance.

The main contribution reported in this article is the development of individual and simultaneous confidence intervals for the p Pratt measures, which should increase their practical usefulness by providing an inferential, rather than merely descriptive, method of deciding whether one explanatory variable is more important than another. The overall goal of the article is thus similar, in a particular sense, to a recent article by Azen and Budescu (2003), who extended Budescu’s (1993) ‘‘dominance’’ approach by introducing new measures of dominance and by adding bootstrapped inference procedures to assess the stability of the dominance ranking. Azen, Budescu, and Reiser (2001) suggested another inferential approach to the assessment of variable importance in regression. These authors introduced the concept of the ‘‘criticality’’ of a variable, derived from the probability that an individual variable will be included in the best subset model fitted to a randomly drawn sample containing responses for one dependent and p explanatory variables. Probabilities were again assessed by means of bootstrap samples, with confidence intervals based on large sample multinomial theory. Although similar in that it provides inferential methods for assessing importance, the Pratt approach described in this article is very different conceptually from the dominance analysis and criticality approaches. Pratt’s derivation of his importance measure is axiomatic and theoretical, whereas the dominance and criticality approaches are based on an intuitive definition of variable importance. It should be noted that Pratt’s measure also has an intuitive definition as discussed by Bring (1996) and by Thomas et al. (1998). This interpretation of Pratt’s measures will be described later.

The problem of assessing variable importance, particularly in multiple linear regression, has been extensively documented since the 1960s, and the articles cited above—in particular, those by Bring (1996), Budescu (1993), and Azen and Budescu (2003)—together provide an extensive review of the literature. In this article, therefore, only specific points that justify the need for importance measures such as Pratt’s (or such as dominance analysis and criticality) will be reviewed.


    Deficiencies in Traditional Measures of Importance
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
Kruskal and Majors (1989) carried out a broad survey of the scientific literature to develop a taxonomy of the various measures that have been used to quantify variable importance. In the context of regression analysis, they found that numerous measures of variable importance were in use, including simple and partial correlation coefficients, regression coefficients (standardized and unstandardized), t statistics, and p values. They also noted Pratt’s (1987) measure and recommended that it be studied further. To the above list can be added a variety of measures based on reduction in variance or on decompositions of variance explained, as documented by Azen and Budescu (2003). For the most part, the primary focus of the literature has been on measures of variable importance, rather than on basic concepts. Thus, the concept of relative importance has been termed ‘‘ambiguous’’ (Kruskal, 1987; Kruskal & Majors, 1989), ‘‘not at all well defined’’ (Healy, 1990), and ‘‘vague’’ and ‘‘diffuse’’ (Bring, 1994). In most applications and discussions of variable importance, the meaning of the concept is implied by the characteristics of the measure selected. Given this general confusion, Azen and Budescu (2003) argued in favor of an intuitive measure of importance that can be readily understood and interpreted by practitioners, and proposed their modified version of dominance analysis. Pratt’s approach (1987), on the other hand, was to define the concept of variable importance in terms of a number of axioms that should be satisfied by any measure of variable importance and to then deduce a unique measure satisfying them.

Some of the problems associated with the traditional measures of variable importance can be illustrated by considering standardized regression coefficients, where the standardization is motivated by the difficulties inherent in interpreting unstandardized coefficients for explanatory variables that are measured on different scales (Achen, 1982; Greenland, Schlesselman, & Criqui, 1986; Healy, 1990). Although standardized coefficients are commonly used as measures of importance, Healy (1990) and Bring (1994) noted that because regression coefficients measure the conditional effect of a variable, standardization should therefore be based on conditional standard errors, rather than the customary unconditional standard errors. Bring (1994) went on to show that using conditionally standardized regression coefficients as measures of variable importance is equivalent to comparing individual t statistics. Unfortunately, a comparison of tj and tk, corresponding to variables xj and xk, provides an invalid assessment of relative importance because, for the full p variable model, each t statistic measures that variable’s contribution to the model over and above the effect of all (p – 1) other variables. Thus, the reference subsets for tj and tk are different making a direct comparison invalid (Azen & Budescu, 2003), a problem that is also associated with other traditional ‘‘partial’’ measures such as partial correlations, semipartial correlations, and so forth. In designing his dominance approach, Budescu (1993) avoided this problem by comparing the contribution of variables xj and xk in turn to a common subset of variables that omits both of them. Because t statistics for individual explanatory variables are related to increments in R2, Budescu’s original definition of dominance can be stated as follows: One variable dominates another if it generates a greater increment in R2 than the other, when added to all the equations formed from all possible subsets of the remaining (p – 2) variables.

Bring’s (1994) conclusion also leads directly to a consideration of importance measures based on partitions of R2, where a portion of the overall R2 is attributed to each predictor. If the independent variables are uncorrelated, there is a unique partition of R2 in which each variable’s contribution is equal to the variance it explains when it is individually regressed on the dependent variable. When the independent variables are correlated, however, it is difficult to find partitions of R2 where each component of variance explained can be logically assigned to an individual variable. If a prior ordering of the variables is available, a sequential orthogonalization of the independent variables leads to an assignable partition. On the other hand, the data-dependent ordering provided by sequential variable selection algorithms should not be used, because different orders of selection leading to the same final model can lead to dramatically different conclusions regarding the importance of a given variable (see, e.g., Stevens, 2002, p. 119). Several authors have attempted to overcome this problem by averaging various partitions of R2 over all possible orderings (for details, see Azen & Budescu, 2003), but this process results in measures that are hard to interpret. The advantage of Pratt’s (1987) approach is that it offers a very simple and theoretically justifiable way of partitioning R2 and assigning the components—namely, sample estimates of the betaj{rho}js, which sum to R2, to individual dependent variables as measures of variable importance.


    Pratt’s Measure
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
Pratt (1987) based his axiomatic development on a population model of the form


Formula(1)

where the error u is uncorrelated with each of the xs, and is distributed with mean zero and variance {sigma}2. The first axioms that he adopted were

1. relative importance depends only on the means, variances, and correlations of y, x1, x2, . . . , xp, (i.e., only on first- and second-order moments);

2. relative importance is not affected by linear transformations of any variable.

He also standardized the (random) xs to have variance one, so that for the two variable case (p = 2), it follows from assumptions 1 and 2 that relative importance depends only on beta1, beta2, {rho}12, and {sigma}2, where the betas are standardized regression coefficients corresponding to x1 and x2, and {rho}12 is the correlation between x1 and x2. Pratt (1987) used a slightly different definition of standardized regression coefficients than is customary, but because these differences do not affect the application of his results, the betas will be treated as standard in this abbreviated account.

Pratt’s (1987) basic approach was to use a symmetry argument to establish his importance measure for a two-variable regression and then to invoke additional axioms to generalize his measure to the p-variable case. He postulated that the two independent variables x1 and x2 could be constructed from a symmetric set of M variables, X1, . . . , XM, all having mean zero and variance s2, and all equally correlated with correlation coefficient r, such that the regression of y on these Xs was given by X1 + X2 + · · · + XM, with residual variance {sigma}2. He further defined x1 = X1 + X2 + · · · + Xm and x2 = Xm + 1 · · · + Xm+n, where m + n + M, and invoked a third axiom—namely, that in this symmetric two-variable situation,

3. the relative importance of x1 to x2 is as m to n.

Under the above setup, the standardized coefficients beta1 and beta2 can be expressed as functions of m, n, r, and s, and Pratt (1987) solved these equations to get an expression for the relative importance of x1 to x2; namely, m/n, given by


Formula(2)

where {rho}1 and {rho}2 denote standardized regression coefficients (i.e., correlations) in the simple regressions of y on x1 and y on x2. A necessary and sufficient condition for the solution to exist (i.e., for the above construction of x1 and x2 to be feasible) is that beta1{rho}1 and beta2{rho}2 both be positive. For the standardized two variable case, the variance explained by the regression of y on x1 and x2 is given by beta1{rho}1 + beta2{rho}2, so that Pratt’s derivation of Equation 2 justifies the use of variance explained as a measure of relative importance for the case p = 2, provided that both of the betaj{rho}j are positive and provided that


Formula(3)

To extend his rule given in Equation 3 to the case p > 2, Pratt first defined the importance of a subset of variables, (x1, . . . , xq), q < p, to be the sum of their individual importances. He then adopted two more axioms; namely, the extension of his second assumption to

4. the nonsingular linear transformation of a subset (x1, . . . , xq) into the subset (x1', . . . , xq') does not affect its importance relative to the other variables; and

5. the addition of a pure noise variable, independent of y and x1, . . . , xp, to a subset of variables does not affect the importance of the subset relative to the other variables.

With these, together with one additional minor assumption, or axiom, for which several alternatives exist, Pratt established the validity of the rule of Equation 3 for assigning relative importance in the general case of p > 2 independent variables.

The Geometric Interpretation of Pratt’s Measure
Bring (1996) and Thomas et al. (1998) used the geometry of least squares to interpret Pratt’s measure. For a sample of size N, the observed values of the variables are represented as vectors y, x1, . . . , xp in an N-dimensional vector space. The least squares solution is represented by the orthogonal projection of the observed dependent variable, y, onto the space spanned by the observed independent variables, x1, . . . , xp. When the observed variables are standardized to have zero mean values and variance one (denoted by a tilde over the character representing the variable), the projection of y, which in nongeometric terms is the vector of fitted values, Formula, is represented algebraically by the weighted vector sum


Formula(4)

where the Formulajs denote sample estimates of the standardized regression coefficients. Thomas et al. (1998) developed a measure of relative importance by projecting each component Formulajxj of Formula onto Formula itself, and then defining the importance of each variable as the ratio of the signed length of each projection to the total length of Formula. They showed that for the jth variable, their measure (denoted by Formulaj) could be expressed as


Formula(5)

that is, as a sample estimate of Pratt’s measure, divided by the R2 corresponding to the full p-variable regression equation. The Formula js sum to one by virtue of their geometric construction. This feature is immediately evident from their algebraic representations in Equation 5, which must sum to one given that the sample Pratt measures sum to R2. Thus, the Formulajs discussed by Thomas et al. (1998) will henceforth be referred to as normalized Pratt measures. Thomas (1992) and Thomas and Zumbo (1996) earlier defined similar measures for the MANOVA case, where they were referred to as ‘‘discriminant ratio coefficients.’’ For a practitioner familiar with the geometry of least squares, the notion of dividing up the total R2 in proportion to the projected lengths of coefficient-weighted variables is an intuitively appealing approach to variable importance. Other interpretations can also be provided, as discussed in Pratt’s (1987) original paper.

Two Criticisms of Pratt’s Measure
The first criticism is that one or more of the sample quantities FormulajFormulaj, can be negative, which some practitioners interpret as negative importance, a meaningless concept. However, as noted in the text following Equation 2, Pratt’s importance rule is valid only when the population quantities betaj{rho}j are all positive. Thus, negativity of any one of these quantities does not signify negative importance but instead signifies a regression situation that is ‘‘too complex for a single measure’’ (Pratt, 1987, p. 245). For the sample estimates, FormulajFormulaj, false negatives as well as false positives can occur (as noted by Pratt, 1987, p. 259), which is a strong argument for the construction of simultaneous confidence intervals on the normalized importance measures of Equation 5, the main goal of this article. If the confidence interval for a particular normalized Pratt measure lies entirely below zero, a real negative is indicated. If a point estimate is negative but the corresponding confidence interval also includes positive values large enough to be of interest, this is an inconclusive situation in which the only remedy may be a larger sample. A situation that is particularly prone to false positives and negatives is that of multicollinearity. Thomas et al. (1998) developed a lower bound on individual normalized Pratt measures that shows, for example, that whenever an explanatory variable exhibits a variance inflation factor of 10 or more, indicative of multicollinearity (see Stevens, 2002, p. 92), at least one normalized Pratt measure can exhibit a value of –1 or lower. They gave an example in which a negative normalized Pratt measure of large magnitude (close to 1) was reduced to a small positive value by the application of ridge regression (Hoerl & Kennard, 1970), suggesting that the original ‘‘negative importance’’ was false. As noted above, not all negative importances will be false, and the fact must be faced that some regression modeling situations are so complex that there is no single measure of variable importance that satisfies Pratt’s axioms. In such cases, the practitioner would be well advised to perform additional analyses. The matrix of explanatory variable correlations should be carefully inspected to provide insight into the relationships among the variables; also, some of the alternative measures that Azen and Budescu (2003) described, for example, might be examined despite their documented shortcomings. Finally, Budescu’s (1993) approach and its recent extensions (Azen & Budescu, 2003) could be used to provide additional detailed information regarding the explanatory variables’ relative contributions to R2.

The second criticism was advanced by Bring (1996), who geometrically illustrated the case of an independent variable, x1, that was orthogonal to y (i.e., uncorrelated with y) but which nevertheless increased R2 when it was added to a regression model originally containing only x2. Such variables are often referred to as suppressor variables (see, e.g., Stevens, 2002, p. 124): namely, variables that make no direct contribution to the regression sum of squares, contributing only by reducing (or suppressing) the overall error sum of squares (or equivalently, the error variance) through their relationship with the other independent variables. In the example, because Formula1 is zero, the normalized Pratt index for x1 is zero, which Bring (1996) considered to be counterintuitive. Thomas et al. (1998) argued that because suppressor variables and nonsuppressor variables contribute to the regression in entirely different ways, it is actually intuitive (rather than counterintuitive) to assess the relative importance of the nonsuppressors using the normalized Pratt measure and to separately assess the relative importance of the suppressors to the nonsuppressors using the measure (R2 RNS2)/R2, where R NS2 denotes the variance explained by the nonsuppressor variables alone.

To this point, the relationship of Pratt’s measure to other measures of variable importance has been described, and some recent criticisms have been put into context. The remainder of the article will address the construction and evaluation of confidence intervals.


    Point Estimates and Confidence Intervals for the Formulajs
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
The finite sample distributional properties of the normalized Pratt measures are technically complicated, even under the assumption of normal errors. The theoretical aspects of the current investigation are therefore based on an asymptotic analysis of the properties of the Formulajs, in the limit N ->{infty}, where N denotes sample size. A simulation study has also been carried out to determine how well the asymptotic confidence intervals perform for the sample sizes that might be encountered in practice. This section will present a summary of the asymptotic results; the results of the simulation study will be reported in a later section.

It will be assumed that the dependent and explanatory variables in Pratt’s model given by Equation 1 are independently drawn from a multivariate normal distribution; that is, (y, x1, . . . , xp)' ~N(µ, {sum}), with positive definite covariance matrix {sum} of dimension (p + 1) x (p + 1) given by


Formula(6)

and mean vector µ = (µy µ'x)' of dimension (p + 1) x1. Normality is a stronger assumption than that adopted by Pratt (1987), who assumed only that the disturbance term u and the independent variables x are uncorrelated. The normality assumption has been made to simplify the derivation of the asymptotic variances of the Formulajs; evaluating these variances under weaker assumptions will be the focus of future research. In the covariance matrix of Equation 6, {sum}xx represents the p x p population covariance matrix of the predictor variables, and {sigma}xy represents the p x 1 vector of population covariances between y and the p x1 vector of explanatory variables x. In terms of these distributional parameters, the variance of the disturbance term u of Equation 1 can be written as {sigma}2 = {sigma}y2{sigma}'xy{sum}xx–1{sigma}xy, and the regression coefficients of Equation 1 can be written as b0 = µyµ'{sum}xx–1{sigma}xy.and b1 = {sum}xx–1{sigma}xy. It can be shown that the population value of the normalized Pratt index defined in Equation 5 can be expressed in terms of these variances and covariances as


Formula(7)

where [.]j denotes the jth element of the vector. It is clear from Equation 7 that the means µy and µx can be set to zero with no loss of generality. The estimate Formulaj defined in Equation 5 can also be evaluated directly from Equation 7 by replacing the population variances and covariances by their sample counterparts obtained from the sample covariance matrix Formula, estimated from a sample of N independent and identically distributed observations of y and x1, . . . , xp. Results of the asymptotic analyses are given below. Proofs are in the appendix.

Consistency
The normalized Pratt measure, Formulaj of Equation 5 is a consistent estimator of the population version of the measure given in Equation 7, that is,


Formula(8)

where Formula denotes convergence in probability. That is, when the sample size is large, the difference between the estimate Formulaj and the population parameter {delta}j will be small, with probability approaching 1.

Asymptotic Normality
The normalized Pratt measure Formulaj has an asymptotically normal distribution as N -> {infty}. Formally, it can be shown that


Formula(9)

where Formula denotes convergence in distribution, and N(0, AV(Formula j)) is a normal distribution with mean zero and variance AV(Formulaj). The asymptotic variance can be expressed as


Formula(10)

where B = (b'{sum}xxb) = {sigma}'xy{sum}xx–1{sigma}xy, and as noted earlier, {sigma}2 is the variance of the error term u in Equation 1 given by {sigma}y2.– B The notation [.]jj denotes the jth diagonal element of the matrix, while [.]j denotes the jth row or the jth column of the matrix, depending on the context. The last term on the right-hand side of Equation 10 represents the variance with respect to x of the conditional mean of Formulaj, whereas the remaining terms arise from the expectation of the conditional variance of Formulaj (see appendix for details).

Variance Estimates
In practice, results of the form of Equations 9 and 10 are interpreted to mean that for large samples, Formulaj has approximate variance N–1AV(Formulaj), where AV(Formulaj) represents an estimate of the asymptotic variance given by Equation 10. The asymptotic variance can be consistently estimated using the standard least squares regression estimate b in place of b, and the estimated covariance matrix of the xjs, obtained as a submatrix of Formula, in place of {sum}xx. The resulting large sample variance estimate will be denoted V(Formulaj). After some algebraic manipulation, V(Formulaj) can be expressed entirely in terms of quantities routinely displayed in, or easily calculated from, the output of standard regression software. Thus,


Formula(11)

where tj is the value of the t statistic corresponding to bj, and Formulaj estimates the standardized regression coefficient defined previously. The accuracy of the large sample variance estimator given by Equation 11 and the coverage properties of the confidence intervals based on it (see next section) have been assessed through simulation. Results of this simulation study will be reported later.

Individual and Simultaneous Confidence Intervals
A large sample 100(1– {alpha})% confidence interval (or interval estimate) for a specific normalized Pratt measure {delta}j corresponding to independent variable xj is given by the interval (FormulajL, FormulajU), where the upper and lower limits are given by


Formula(12)

Here Z{alpha}/2 is the upper {alpha}/2 percentage point of the standard normal distribution. This interval corresponds to the following probability statement:


Formula(13)

that is, the interval will cover the true value of the jth normalized Pratt importance measure with approximate probability (or confidence) 1 –{alpha} Individual confidence intervals can be defined for the importances of each of the p independent variables. However, in this case, the overall coverage, that is, the probability that all p intervals will simultaneously cover their respective population importances, will be considerably less than 1 – {alpha}. This is the confidence interval equivalent of the problem of cumulation of Type I error associated with multiple tests of hypothesis (see, e.g., Stevens, 2002, p. 6). The Bonferroni inequality can be used to define asymptotically conservative simultaneous confidence intervals by constructing the p individual intervals with confidence level 1– {alpha}', where {alpha}' = {alpha}/p. The large sample simultaneous confidence interval statement then represents the application of Equation 13 to all p importances but with {alpha} replaced by {alpha}' = {alpha}/p, that is,


Formula(14)

The above distinction between individual and simultaneous confidence intervals is often a source of confusion to practitioners. The inspection of an individual confidence interval will be appropriate if the importance of a specific variable is of prior interest—that is, the variable is being studied for reasons unconnected to the results of the data analysis. On the other hand, if a variable is of interest because its Pratt index is the largest, say, then inspection of an individual confidence interval for that variable would not be appropriate. Selection of the largest index implies a comparison with all other variables so that appropriate inference must be based on a set of intervals that accounts for variation in all Pratt indices simultaneously. This distinction will be further illustrated later.


    The Simulation Experiment
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
The simulation experiment was primarily designed to explore the performance of the individual and simultaneous confidence interval procedures speci-fied by Equations 13 and 14. Because the confidence intervals depend on point estimates of the normalized Pratt measures, {delta}j, and on their variances, the biases in the estimates Formulaj and in their estimated variances were also examined. The simulation explored confidence interval performance for three variable and five variable regressions under a range of configurations of the Formulajs, for sample sizes ranging from 50 to 1,000. The reported results correspond to regressions with population R2 equal to .5 only. Other simulation results not reported here indicate that the conclusions of the study are not sensitive to variations in R2. The elements of the covariance matrix of Equation 6 were manipulated to yield the desired configurations of Pratt measures; in all cases (and without loss of generality), the diagonals of {sum}, namely, the variances of y and the independent xs, were set to 1. For three variables, six cases involving different configurations of normalized Pratt measures were studied, as follows:

  1. three measures corresponding to a case of model multicollinearity;
  2. one big measure and two small measures (recall that the normalized Pratt measures must sum to 1);
  3. three equally sized measures;
  4. one big, one medium, and one small measure;
  5. a case involving one suppressor variable;
  6. a case involving a nonignorable negative index.

Similarly structured configurations were examined for five explanatory variables, with the exception that cases E and F were combined for the five variable model. Values of the true normalized Pratt measures are shown in Table 1 for each case. In cases B, C, and D, for both three and five explanatory variables, all Pratt measures are positive and can therefore be meaningfully interpreted as measures of importance. In case E, there is one negative normalized measure of relatively small magnitude for both the three- and five-variable situations. As shown by Thomas et al. (1998), the normalized measures corresponding to the nonsuppressors still provide approximate measures of relative importance in such situations, although the recommended procedure is to partial out the contribution of the suppressor variable prior to assessing importance. This aspect of importance analysis has not been considered in the present study. Cases A (for three and five variables) and F (for three variables) all feature negative normalized Pratt measures of substantial magnitude. These measures cannot, therefore, be interpreted in terms of importance and represent cases in which confidence intervals could profitably be used to confirm the difficulty. The simulation experiment consisted of 500 independent replications of the following five-step process for the three- and five-variable cases in turn:


View this table:
[in this window]
[in a new window]

 
TABLE 1 True Values and Simulation Estimates of the Expected Values of the Normalized Pratt Measures

 
  1. A sample of 1,000 observations was drawn from the multivariate normal distribution specified in Equation 6 corresponding to each configuration of Pratt indices.
  2. The normalized Pratt measures of Equation 5 and their large sample variance estimates for Equation 11 were computed.
  3. The individual and simultaneous confidence intervals given in Equations 13 and 14 were then computed for each variable and tested against the known values of the {delta}js. A counter was updated each time an interval covered its corresponding parameter. A nominal confidence interval level of 95% was used in all cases, corresponding to an alpha level in Equations 13 and 14 of 5%.
  4. The process was repeated for sample sizes 500, 250, 100, and 50. To induce a correlation between estimates at different sample sizes, and hence reduce the simulation error for comparisons across sample sizes, the smaller sample sizes were selected as subsets of the original 1,000 (see Thomas & Rao, 1987).

Based on the 500 replications, a number of summary measures were computed for the three- and five-variable models, namely:

  1. Simulation estimates of the expected values of the Formulajs, denoted Ês(Formulaj). For each experimental condition, these were computed as means of the individual Pratt indices over the 500 replications.
  2. Simulation estimates of the expected values of the estimated asymptotic variances V(Formulaj). Denoted ÊsV(Formulaj), these were computed for each experimental condition as means of the variance estimates for Equation 11 over the 500 replications.
  3. Simulation estimates of the true variances of the Formulajs, denoted Vs(Formulaj). These were computed directly from the 500 replications of the Formulajs corresponding to each experimental condition.
  4. Coverage rates of the individual confidence intervals of Equation 13.
  5. Coverage rates of the simultaneous confidence intervals of Equation 14.

Results are discussed in the following section.


    Results
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
Simulation Estimates of the Expected Values
Values of the Ês(Formulaj)s, namely, the simulation estimates of the expected values of the {delta}js, are shown in Table 1 for all cases, for both three- and five-variable situations, and for sample sizes of 50 and 250. Asymptotic values are shown in bold type. It is clear from the table that the expected values of the normalized Pratt measures converge rapidly to their true (and asymptotic) values given by Equation 7 as sample size increases. Thus, the bias in the estimated normalized Pratt index of Equation 5 converges rapidly to zero with increasing sample size, and the estimator can be considered effectively unbiased for sample sizes of 250 or more. The correspondence between estimates and asymptotic values is close even for a sample size as low as 50.

Asymptotic Variances and True Variances
To simplify the assessment of the variance estimates, which depend inversely on N, all numerical results will be reported as N times the variance under consideration. Simulation estimates of N times the expected values of the asymptotic variance estimates given by Equation 11, denoted N x ÊsV(Formulaj), behave in similar fashion to that described above for the Ês(Formulaj)s. That is, they approach their asymptotic value quite rapidly, the convergence being slowest for case A, which features appreciable multicollinearity. Even for case A, all estimated expected values of N x V(Formulaj) are within 10% of the asymptotic variance AV(Formulaj) for sample sizes of 250 or more. For the other cases, the divergence between expected value and asymptotic variance is much less, well within 5% in most cases. Tabular details are omitted in the interests of space.

Simulation estimates of N times the true variances, denoted N x Vs(Formulaj), are shown in Table 2, for sample sizes of 50, 250, and 1,000, along with the asymptotic variances (in bold type). It can be seen that the simulation estimates of Table 2 exhibit greater variation than those of Table 1, which is not surprising given that variance estimates necessarily have greater variability than mean estimates. Nevertheless, it can be seen that the simulation estimates of the true variance (xN) are generally within 10% of the asymptotic variance for sample sizes of 250 or more. Deviations are not much larger even for sample sizes as low as 50. For case A of Table 2—the case featuring a multicollinear model—it can be seen that the variables involved in the multicollinearity have very high estimated and asymptotic variances. Even for these rather extreme cases, the asymptotic variance provides a close match to N times the true variance. In summary, the results of Tables 1 and 2 suggest that good coverage properties will be exhibited by the individual and simultaneous confidence intervals.


View this table:
[in this window]
[in a new window]

 
TABLE 2 Asymptotic Variances and Simulation Estimates of the True Variances of the Normalized Pratt Measures

 
Coverage Properties of the Individual Intervals
The coverage properties of the individual confidence intervals given in Equation 13 are shown in columns 3 to 5 and 8 to 10 of Table 3, and in columns 3 to 7 of Table 4, for three explanatory variables and five explanatory tables, respectively. All configurations of the {delta}js are shown for sample sizes 50, 250, and 1,000. From Tables 3 and 4, the effect of sample size can clearly be seen. For a sample size of 50, the coverage in some cases is as low as 86%, at a nominal level of 95%. This is equivalent to having a test of hypothesis exhibit a true test level of 14% at a nominal test level of 5%. Such a low coverage (equivalent to a liberal test level) is not acceptable in practice. However, from the tables, it can be seen that for a sample size of 250, the actual coverage rates are within ±2% of the nominal level with only one exception. This is consistent with the assertion that the true coverage of the individual asymptotic intervals (for a sample size of 250) is close to 95%, given that the binomial confidence interval on an individual coverage estimate is ±2% for 500 replications when the true coverage level is 95%. There is no material difference in confidence interval performance for the three- and five-variable models, nor does there appear to be any material differences between the individual cases exhibited in Tables 3 and 4. This is an important finding as it would not be wise in practice to decide for or against using the confidence interval procedure based only on a sample estimate of the configuration of the {delta}js.


View this table:
[in this window]
[in a new window]

 
TABLE 3 Coverage of Individual and Simultaneous Confidence Intervalsa for the {delta}js: Three Explanatory Variables

 

View this table:
[in this window]
[in a new window]

 
TABLE 4 Coverage of Individual and Simultaneous Confidence Intervalsa for the {delta}js: Five Explanatory Variables

 
Coverage Properties of the Simultaneous Intervals
The coverage properties of the simultaneous confidence intervals given in Equation 14 are shown in columns 6 and 11 of Table 3, and in column 8 of Table 4, for three explanatory variables and five explanatory tables, respectively. Results are again shown for all configurations of the {delta}js for sample sizes of 50, 250, and 1,000. The coverage results for the simultaneous intervals are generally similar to those for the individual intervals. It clear from Tables 3 and 4 that for a sample size of 50, coverage rates can be far too low: less than 80% in some cases at a nominal level of 95%. However, at a sample size of 250, simultaneous coverage rates are again close to the 95% ±2% range, which suggests a true coverage close to the nominal value. At a sample size of 1,000, coverage rates are even closer to the nominal level, as expected. Again, there is no material difference between coverage levels across the configurations of the {delta}js or between the three variable and five variable models.

Overall Summaries of Individual and Simultaneous Coverage
Figures 1 and 2 provide an overall summary of the results for individual and simultaneous coverage levels, respectively. Both figures display results for all configurations of the normalized Pratt measures for both three- and five-variable models on the same graph. Moreover, coverage levels of individual variables are not identified separately in Figure 1. The two figures are useful in that they provide a gross overview of the performance of both individual and simultaneous confidence interval procedures as a function of sample size. Results for all sample-size settings are displayed—namely, 50, 100, 250, 500, and 1,000. Figure 1 displays 43 coverage level estimates for each sample size (6 cases with 3 variables each and 5 cases with 5 variables each), and if it is assumed that the true coverage level for each is equal to the nominal 95%, then a Bonferroni confidence band for the 43 coverage estimates at each sample size setting based on 500 Bernouilli replications should be 95% ±3.1%. Figure 1 thus illustrates the primary conclusion drawn from Tables 3 and 4 for the individual confidence intervals, that over a wide range of test situations, the estimated individual coverage rates are consistent with a true coverage level close to the nominal 95% whenever the sample size is 250 or greater. In fact, it can be seen that individual confidence intervals might still be useable for sample sizes down to 100 as long as individual coverage levels as low as 90% can be tolerated (at the nominal 95% level).


Figure 10320061
View larger version (8K):
[in this window]
[in a new window]

 
FIGURE 1 Coverage levels of the individual normalized Pratt confidence intervals for three and five variables and for all configurations of normalized Pratt measures.

 

Figure 20320061
View larger version (7K):
[in this window]
[in a new window]

 
FIGURE 2 Coverage levels of the simultaneous normalized Pratt confidence intervals for the three- and five-variable models and for all configurations of normalized Pratt measures.

 
Figure 2 displays 11 coverage level estimates for each sample-size setting, with a corresponding Bonferroni confidence band with half-width 2.8%. Thus, Figure 2 confirms the primary conclusion drawn from Tables 3 and 4 for the simultaneous confidence intervals—namely, that the estimated simultaneous coverage rates are consistent with a true coverage level close to the nominal 95% whenever the sample size is 250 or greater. However, the results of Figure 2 show that simultaneous coverage rates at a sample size of 100 are markedly lower than for the individual coverage rates at the same sample size.

This is surprising at first sight because simultaneous Bonferroni confidence intervals based on exact statistics have a coverage level that is necessarily equal to or greater than 1 – {alpha} (see, e.g., Miller, 1981, p. 67). There are two observations that help explain the above finding. First, Miller (1981, p. 253) comments favorably on the tightness of the Bonferroni bound when the individual {alpha}' are small. Because these are set in this article at {alpha}' = {alpha}/p for {alpha} = .05, with p = 3 and 5, the theoretical conservativeness of the Bonferroni bound is likely to be slight. Second, the Bonferroni intervals in this article are constructed using approximate, asymptotically based distribution theory, and it is typical for the coverage performance of asymptotic confidence intervals to deteriorate as the individual {alpha} levels decrease (see, e.g., Thomas, Singh, & Roberts, 1996). As described in the text preceding Equation 14, the construction of simultaneous Bonferroni intervals involves replacing the {alpha} of the individual intervals by the smaller value {alpha}' = {alpha}/p, which then results in a corresponding deterioration in the approximate simultaneous interval coverage. It is clear from the results of Figure 2 that this effect dominates the theoretical (but slight) conservativeness of the simultaneous Bonferroni intervals. In conclusion, asymptotic simultaneous confidence interval procedures for the normalized Pratt measures are not recommended for use with sample sizes much less than 250.

In view of the above results and recommendations, it is important to note that a decision on whether to use individual or simultaneous intervals should not be based on the differences in coverage performance to be expected at a given sample size. As discussed in the text following Equation 14, individual intervals should only be used when there is some a priori interest in the importance of a particular variable. Whenever variable importances are to be examined and compared a posteriori, simultaneous intervals are required, in which case the minimum sample size recommendation of 250 should be respected. Should a practitioner choose to use the individual 95% intervals in a situation requiring simultaneous inference, then the simultaneous coverage would be approximately 85% for a three-variable regression and 75% for a five-variable case—levels that are too low for useful inference.


    Example
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
This example is based on a subset of data taken from a large study on the work-life conflict experienced by Canadian employees, documented in a report to Health Canada by Duxbury and Higgins (2003). To illustrate the use of the normalized Pratt measures and their corresponding confidence intervals, a regression analysis featuring one dependent variable and six explanatory variables was investigated. The dependent variable consisted of a work-family conflict measure based on a five-item scale: work-to-family interference (Duxbury & Higgins, 2003). The predictor variables consisted of a theoretically linked set of indicators of the culture of each respondent’s organization with respect to work-life balance. These individual item predictors were measured on a 5-point scale, and represented degrees of agreement or disagreement with the following six statements:

ORG: Organization promotes environment supportive of work/personal balance

LEAVE: Have seriously thought about switching to another organization

COWSUP: Coworkers supportive of personal/family responsibilities

LONG: Inability to work long hours would limit career

ADVANCE: Family responsibilities hamper advancement

NONO: Not acceptable to say no to more work

The sample size was 3,320, large enough to guarantee the applicability of the proposed asymptotic confidence interval procedures, and the regression R2 was .382. Given that this example features a 5-point measurement scale for the predictor variables, application of the proposed confidence interval procedures is a technical violation of the assumption of multivariate normality on which their derivation is based. Thus, it is implicitly assumed in the analysis of this specific example that the violation does not materially affect the conclusions. This is an important issue in general, given the importance of discrete Likert-type scales in behavioral research, and it will be discussed further below. Table 5 displays values of (a) the standardized regression coefficients; (b) the simple correlation between the dependent variable and each of the six explanatory variables; (c) the t statistics; (d) the partial correlation between the dependent variable and each of the explanatory variables, adjusted for the remaining five, denoted Formulayj|(j);(e) the normalized Pratt measures (the Formulajs) for each explanatory variable, shown in bold; and (f) the corresponding 95% simultaneous confidence intervals for the Formulajs. Figure 3 illustrates the simultaneous confidence intervals for the normalized Pratt measures.


View this table:
[in this window]
[in a new window]

 
TABLE 5 Point and Interval Estimates of Importance for the Work-Life Interference Example

 

Figure 30320061
View larger version (6K):
[in this window]
[in a new window]

 
FIGURE 3 Simultaneous 95% confidence intervals for the normalized Pratt measures in the work-life interference example.

Note: LEAVE = have seriously thought about switching to another organization; LONG = inability to work long hours would limit career; ORG = organization promotes environment supportive of work/ personal balance; ADVANCE = family responsibilities hamper advancement; NONO = not acceptable to say no to more work; COWSUP = coworkers supportive of personal/family responsibilities.

 
The Pratt indices of Table 5 are given by the products of the first two columns divided by R2 = .382, as defined by Equation 5. However, although the standardized regression coefficients (first column) give the same ordering as the Pratt coefficients for this specific example, the orderings for variables LONG and ORG given by the magnitudes of the simple correlations (second column) and the Pratt indices are different. Furthermore, the simple correlations for LONG and ADVANCE are virtually identical, but their Pratt indices differ by more than a factor of two. The magnitudes of the t statistics and the partial correlations (shown for comparison) give identical orderings and are, in fact, directly proportional by definition. However, the orderings given by the magnitudes of these two statistics differ from those given by the Pratt indices, the latter indicating that LEAVE is more important than LONG, whereas the former two statistics suggest the reverse.

Quite apart from any similarities and differences in the orderings given by the above statistics, it should be recalled that the normalized Pratt measure is the unique measure of relative importance for the explanatory variables that satisfies Pratt’s axioms and definitions. Thus, from the Formulajs of Table 5, it can be stated that LEAVE is about 3 times more important than NONO, a relative comparison that cannot be justified using any of the other measures of importance. The interval estimates illustrated graphically in Figure 3 show more clearly the importance relationship between these variables. Although the point estimate of the normalized Pratt measure for LEAVE is larger than that for LONG and than that for ORG, it can be seen that there is a large overlap between their simultaneous interval estimates. Thus, it would not be valid to infer that LEAVE is more important than LONG, or ORG, based on point estimates alone. The results of Figure 3 suggest that the six variables should be ranked in importance by first grouping them into three hierarchies, with LEAVE, LONG, and ORG comprising the most important group of variables for explaining work-to-family interference. ADVANCE and NONO would rank second in importance, although it should be noted that this categorization of the five most important variables into two distinct groups of three and two, respectively, requires that the small overlap between NONO and COWSUP be ignored. This is a judgment call, which is based on the fact that the intervals are approximate and that an interpretation in strict hypothesis testing style would be inappropriate for this type of data analysis. Furthermore, it can be seen that the confidence interval for COWSUP includes small negative values, so that the most reasonable interpretation is that COWSUP has negligible importance and can be dropped from all importance considerations.

It should also be noted that specific pairwise comparisons, between COWSUP and NONO, for example, could be sharpened by extending the analysis described in the appendix to the construction of simultaneous confidence intervals for all p(p – 1)/2 pairwise comparisons. This would require further simulation to determine the performance of these asymptotic intervals as a function of sample size, a project that has yet to be implemented.

Likely Effect of Discrete Predictors
Discretely measured predictor variables of the Likert-type illustrated in the above example are very popular in the behavioral sciences, and it is commonly assumed that the discreteness will not adversely affect an analysis provided that at least five categories are used and provided that the discrete data do not exhibit excessive skewness or kurtosis. Although it is not the intent of this article to investigate in detail the robustness of the proposed intervals to discreteness and nonnormality, it is relevant to consider what the effects of discreteness are likely to be in the above example. A normal probability plot of the regression residuals (not shown) provides no evidence at all of nonnormality, indicating that the conditional distribution of WTFINT, given the predictor variables, is normal. Thus, the investigation of nonnormality and its effects can focus on the joint distribution of the six predictors. Summary statistics based on the sample of 3,320 observations are shown in Table 6, and from the values of univariate skewness and kurtosis, it is evident that the assumption of multivariate normality is violated (tests on Mardia’s 1970 measures of multivariate skewness and kurtosis also reject normality, with p < .0001). The question of interest, however, is what effect this violation will have on the Pratt indices and their corresponding confidence intervals.


View this table:
[in this window]
[in a new window]

 
TABLE 6 The Mean, Standard Error, Skewness, and Kurtosis of the Explanatory Variables

 
The effect of discreteness and nonnormality on the distribution of the explanatory variables will be manifested through the contribution of the second term in Equation A.10 of the appendix, the variance of the conditional mean of the Pratt indices. The latter is given explicitly in Equation A.17, from which it can be seen that the effect will depend critically on the variances and covariances of the elements of X'X/N, the estimated variance matrix of the explanatory variables. So, in investigating the likely effects of discreteness and nonnormality on the variances of the Pratt indices, and hence on their confidence intervals, the closest parallel is the effect of nonnormality and discreteness of indicator variables in structural equation modeling (SEM). Classical versions of SEM are based on the assumption of multivariate normality of the indicators, and the question of discreteness and nonnormality of indicator variables has received considerable attention. Reviews are provided by Bollen (1989) and by West, Finch, and Curran (1995), and some general conclusions are that for confirmatory factor analysis and simple SEM models, (a) parameter estimates are relatively unaffected by discreteness provided the distributions are approximately normal (skewness and kurtosis values less than 1 in magnitude being cited); (b) standard errors are more sensitive to nonnormality and can be markedly underestimated when variables are highly (and differentially) skewed or when they exhibit large excess kurtosis; (c) when the number of categories has an effect, the effect is strongest when the number of categories is less than 5. The results of a study by Chou and Bentler (1995) are of particular relevance to the current example. They performed a simulation study of the performance of three different SEM estimators for a confirmatory factor analysis, using six different settings of nonnormality. Among other things, they examined the coverage properties of confidence intervals on the model parameters, and it is clear from their results that for their normal theory estimator, coverage rates equaled or exceeded their nominal level for the nonnormal cases featuring negative kurtosis. This is likely because of the fact that assuming multivariate normality for distributions featuring negative kurtosis leads to an overestimate of the variances and covariances of the elements of X'X/N. The results of Muthen and Kaplan (1992), based on discrete five-category variables, show a similar trend for the biases in parameter standard errors. It can be seen from Table 6 that the predictor variables in the work-life interference example exhibit negative kurtosis except for COWSUP, which exhibits positive but small kurtosis, and it can also be seen that the skewness values are all much less than 1 in magnitude. Thus, if we take these SEM results as a possible guide to the effects of discreteness and nonnormality on Pratt indices and their confidence intervals, the above inferences for the work-life conflict example should be reliable. More theoretical and empirical work on this issue is clearly required.

The validity of the inferences for this specific example can also be investigated by regarding the discrete variables as observed versions of underlying multivariate normal variables that have been categorized with respect to a set of cutpoints (see, e.g., Bollen, 1989, p. 439). One effect of categorizing underlying variables is to reduce the magnitude of the Pearson correlation coefficients between them (see Krieg, 1999, and the references therein), an effect that will then generate bias in regression coefficients estimated using the categorized data. If the observed variables for the work-family life example are characterizing in terms of underlying normal variables, an indication of the effect of their discreteness can be obtained by (a) estimating the underlying correlations between the dependent and predictor variables, and among the predictor variables, by means of polyserial and polychoric correlations, respectively; (b) evaluating the Pratt indices and corresponding confidence intervals based on these underlying correlations. Note that these ‘‘Pratt indices’’ and ‘‘confidence intervals,’’ displayed in Figure 4, are not intended to provide an alternative analysis because the variances of the polyserial and polychoric correlations will not be accounted for. They are intended only to provide an indication of the sensitivity of the work-life interference example to variable discreteness and nonnormality. An initial indication of this sensitivity is given by the effect on the regression R2, which increases by 6% when the polyserial and polychoric correlations are used, from 38.2% to 40.4%. From Figure 4, it can be seen that the corresponding point estimates and confidence intervals are very similar to those displayed in Figure 3. Thus, the indications are that the effect of variable discreteness on the Pratt confidence intervals in the work-life conflict example is slight, and that the original inferences can be trusted.


Figure 40320061
View larger version (6K):
[in this window]
[in a new window]

 
FIGURE 4 Simultaneous 95% confidence intervals based on polyserial and polychoric correlations.

Note: LEAVE = have seriously thought about switching to another organization; LONG = inability to work long hours would limit career; ORG = organization promotes environment supportive of work/ personal balance; ADVANCE = family responsibilities hamper advancement; NONO = not acceptable to say no to more work; COWSUP = coworkers supportive of personal/family responsibilities.

 

    Summary and Conclusions
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
The issue of variable importance in linear regression has been reviewed, and the importance measure justified theoretically by Pratt (1987) has been examined in detail. In this article, the utility of Pratt’s measure has been extended in two ways. First, approximate estimates of the variances of normalized Pratt measures corresponding to individual independent variables have been developed, based on an asymptotic analysis. Calculation of these estimates requires only information routinely printed in the output of standard regression programs. No new software is required. Simulation results have shown that the approximate variance estimate is suitably accurate for sample sizes frequently encountered in practice—namely, 250 or more and, in many cases, for sample sizes as low as 100. Second, these large sample variance estimates have been used to construct simultaneous confidence intervals (or interval estimates) for the normalized Pratt measures corresponding to the full set of independent variables. Again, these intervals can be easily calculated from standard regression software output. The results of the simulation have shown that these confidence intervals provide good coverage numerically close to the nominal 95% level for sample sizes of 250 or more. For smaller samples, the asymptotic confidence intervals tend to be liberal. However, for individual confidence intervals, appropriate when the importance of a specific variable is of prior interest unconnected to the results of the data analysis, this effect is not drastic even for samples as small as 100, in which case the actual confidence level exceeds 90% in most cases. For the simultaneous confidence intervals, however, the sample size requirement is not as flexible, and a minimum sample size of about 250 is recommended.

As discussed in the review of the literature, Pratt’s (1987) measure is not the only viable measure of importance. Budescu’s (1993) ‘‘dominance’’ approach recently extended by Azen and Budescu (2003) and the ‘‘criticality’’ approach of Azen et al. (2001) comprise intuitively reasonable approaches to the issue, and they also provide inferential techniques for comparing their assessments across variables. However, because these inferences are based on bootstrapping methods, both approaches become heavily computer intensive and require special software. The methods developed in this article, featuring both point and interval estimates of the normalized Pratt measure of variable importance, are far easier to implement and require no special software. Thus, it is hoped that the results of this article will encourage researchers to examine the importance of their variables. To do so would be consistent with the recommendations of Kirk (1996) and others in favor of using ‘‘effect sizes’’ to assist with the interpretation of statistical analyses. To know that a regression coefficient is nonzero should only be the beginning of the interpretation. A full statistical assessment of the relative importance of the variable compared to the other independent variables should be regarded as an essential aspect of the analysis.


    Appendix
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 
For a sample of size N, the linear model 1 can be written as


Formula(A1)

where 1N represents an N x1 vector of ones, y is an N x1 vector of observations of the dependent variable, X1 is an N x p matrix of observed explanatory variables, that is, X1 = (x11, x12, . . . x1p), and u is an N x1 vector of disturbances.

Consistency
The population standardized Pratt index {delta} given in Equation 7 can be estimated as


Formula(A2)

where Q is a centering matrix defined as Q = (IN1N1N'/N) and IN is an identity matrix of dimension N. Because X1'Qy/N is a consistent estimator of {sigma}xy, and X1'QX1/N is a consistent estimator of {sum}xx, the consistency result of Equation 8 follows directly.

Asymptotic Normality
To prove asymptotic normality, it is convenient to start with the numerator of Equation A2, with y replaced by Equation A1. In addition, because Q = Q' = Q2, and because X1 and x1j in Equation A2 can be replaced by X = QX1 and xj = Qx1j, it follows that the numerator of Equation A2, denoted pj, can be written as


Formula(A3)

Then,


Formula(A4)

where Eux denotes expectation with respect to the joint distribution of u and X, Ex denotes expectation with respect to the marginal distribution of X, and Eu|x (to be used later) denotes expectation with respect to the conditional distribution of u. Consider next the centered and scaled version of pj/N, given by


Formula(A5)

It can be shown that as N -> {infty}, the variances of the first three terms on the right-hand side of Equation A5 are O(1), whereas the variance of the last term is O(N–1/2); that is, it tends to zero as N -> {infty}. Thus, the last term is of order N–1/2 in probability (see Barndorff-Nelson & Cox, 1989, p. 31), and to first order, Equation A5 can be written as follows:


Formula(A6)

The first two terms on the right-hand side of Equation A6 comprise a linear combination of inner products given by


Formula(A7)

which are asymptotically normal with zero mean and variance to be determined later, given that the N random variables xjlul are independent and identically distributed, with zero mean, and provided that their second moments are finite. The latter condition is satisfied under normality, but the normality assumption is clearly not essential. A similar central limit theorem applies to the third term of Equation A6, which consists of the inner products


Formula(A8)

Provided the second moment of xjlxkl is finite, inner products of the form of Equation A8 will also be asymptotically normal. Therefore, pj/N is asymptotically normal since the right-hand side of Equation A6 consists of a linear combination of asymptotically normal terms. To investigate the asymptotic normality of the normalized Pratt index, Formulaj, note that


Formula(A9)

Given that pj/N is asymptotically normal, Formula is also asymptotically normal, and the correlation between these terms can be evaluated from Equation A6. Because the means of the numerator and denominator of Equation A9 are nonzero, it follows by linearization that Formulaj is also asymptotically normal (see Barndorff-Nielsen & Cox, 1989, p. 41).

Asymptotic Variance of Formulaj
To evaluate the asymptotic variance it is convenient to proceed via the variance decomposition


Formula(A10)

where the subscripts on the variances are defined equivalently to those for the expectations.

Conditional Asymptotic Variance
The asymptotic conditional variance of the ratio specified in Equation A9 can be obtained by linearization, as described by Barndorff-Nielsen and Cox (1989, p. 41). With pj/N and Formula denoted by w and z, respectively, the linearized form of Vu|x(Formulaj) can be expressed in terms of conditional expectations and variances as


Formula(A11)

The conditional expectations in Equation A11 can be obtained from Equation A3, from which it follows that


Formula(A12)

Given that X'X/N consistently estimates {sum}xx, Equation A12 can be replaced by


Formula(A13)

The corresponding expression for Formula follows immediately. The conditional variances and the covariance term can be derived correct to O(N–1) from Equation A6. Under the assumption of normal and independent disturbances ul, the derivation is tedious but routine. The results are as follows:


Formula(A14)


Formula(A15)


Formula(A16)

Substitution of Equations A14 through A16 into Equation A11 leads to the terms in the curly brackets of Equation 10 for AV(Formulaj), the asymptotic variance of Formulaj.

Variance of the Conditional Mean
To evaluate the second term in Equation A10, namely, the variance of the conditional mean of Formulaj, the conditional mean of Formulaj must first be specified as a function of X. From the usual linearization argument, it follows from Equations A9 and A12 that


Formula(A17)

Thus, it remains to evaluate the variance of the expression of Equation A17 under the normality assumption described in Equation 6 and the accompanying text. The linearized expression for the variance of a ratio given in Equation A11 again applies, but in this case, the individual variances and covariances involve fourth moments of the xjs. The advantage of the normality assumption is that fourth moments can be expressed as functions of variances and covariances, which greatly facilitates the derivation. Expressions for the required moments can be found in Kendal and Stuart (1977, p. 85). After some algebra, the required variance yields the final term in Equation 10, thus completing the derivation of the expression for the asymptotic variance of Formulaj. Details are omitted in the interests of space.


    Footnotes
 
D. ROLAND THOMAS is Professor, Sprott School of Business, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6;rthomas{at}sprott.carleton.ca. He is interested in a variety of modeling techniques of relevance to the business and social sciences and in the analysis of complex survey data. Back

PENGCHENG ZHU is a PhD student in the Sprott School of Business, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6;phil_zhu1979{at}yahoo.ca. He specializes in finance and related applications of statistics. Back

YVES J. DECADY, PhD, is an Analyst in the Labor Statistics Division, Statistics Canada, Jean Talon Building, Ottawa, Ontario, Canada, K1A 0T6;Yves.Decady{at}statcan.ca. His current area of research is labor dynamics. Back

The first author’s work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada and by grants from two Canadian research networks: namely, MITACS (Mathematics of Information Technology and Complex Systems) and NPCDS (National Program on Complex Data Structures). The authors wish to thank Professors Linda Duxbury and Christopher Higgins for permission to use their data to illustrate the proposed methodology. Back

Received for publication August 27, 2004. Accepted for publication July 13, 2005.


    References
 TOP
 Abstract
 Deficiencies in Traditional...
 Pratt's Measure
 Point Estimates and Confidence...
 The Simulation Experiment
 Results
 Example
 Summary and Conclusions
 Appendix
 References
 

  • Achen, CH. (1982). Interpreting and using regression. Beverly Hills, CA: Sage.
  • Azen, R, & Budescu, DV. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8, 129-148.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  • Azen, R, Budescu, DV, & Reiser, B. (2001). Criticality of predictors in multiple regression. British Journal of Mathematical and Statistical Psychology, 54, 201-225.[CrossRef]
  • Bollen, KA. (1989). Structural equations with latent variables. New York: John Wiley.
  • Barndorff-Nielsen, OE, & Cox, DR. (1989). Asymptotic techniques for use in statistics. London: Chapman and Hall.
  • Bring, J. (1994). How to standardize regression coefficients. American Statistician, 48, 209-213.[CrossRef]
  • Bring, J. (1996). A geometric approach to compare variables in a regression model. American Statistician, 50, 57-62.[CrossRef]
  • Budescu, DV. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114, 542-551.[CrossRef][Web of Science]
  • Chou, C-P, & Bentler, PM, Hoyle, RH (Ed.). (1995). Estimates and tests in structural equation modeling. Structural equation modeling: Concepts, issues and applications. Thousand Oaks, CA: Sage.
  • Darlington, RB. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69, 161-182.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  • Duxbury, L, & Higgins, C. (2003). Work–life conflict in Canada in the new millennium: A status report. Health Canada. Retrieved February 1, 2004, from http://www.hc-sc.gc.ca/pphb-dgspsp/publicat/work-travail/report2/
  • Greenland, S, Schlesselman, JJ, & Criqui, MH. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123, 203-208.[Free Full Text]
  • Healy, MJR. (1990). Measuring importance. Statistics in Medicine, 9, 633-637.[Web of Science][Medline] [Order article via Infotrieve]
  • Hoerl, AE, & Kennard, RW. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55-67.[CrossRef][Web of Science]
  • Kendal, M, & Stuart, A. (1977). The advanced theory of statistics, Vol. 1, distribution theory (4.). New York: Macmillan.
  • Kirk, RE. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 23, 655-675.
  • Krieg, EF. (1999). Biases induced by coarse measurement scales. Educational and Psychological Measurement, 59, 749-766.[Abstract/Free Full Text]
  • Kruskal, W. (1987). Relative importance by averaging over orderings. American Statistician, 41, 6-10.[Medline] [Order article via Infotrieve]
  • Kruskal, W, & Majors, R. (1989). Concepts of relative importance in recent scientific literature. American Statistician, 43, 2-6.[CrossRef]
  • Mardia, KV. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519-530.[Abstract/Free Full Text]
  • Miller, RG. (1981). Simultaneous statistical inference (2.). New York: Springer-Verlag.
  • Muthen, B, & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30.
  • Pratt, JW. In Pukkila, T, & Puntanen, S (Eds.). (1987). Dividing the indivisible: Using simple symmetry to partition variance explained. Proceedings of the Second International Conference in Statistics (p. 245-260). Tampere, Finland: University of Tampere.
  • Stevens, J. (2002). Applied multivariate statistics for the social sciences (4.). Hillsdale, NJ: Lawrence Erlbaum.
  • Thomas, DR. (1992). Interpreting discriminant functions: A data analytic approach. Multivariate Behavioral Research, 27, 335-362.[CrossRef][Web of Science]
  • Thomas, DR, Hughes, E, & Zumbo, BD. (1998). On variable importance in linear regression. Social Indicators Research, 45, 253-275.[CrossRef][Web of Science]
  • Thomas, DR, & Rao, JNK. (1987). Small-sample comparisons of level and power for simple goodness-of-fit statistics under cluster sampling. Journal of the American Statistical Association, 82, 630-636.[CrossRef][Web of Science]
  • Thomas, DR, Singh, AC, & Roberts, GR. (1996). Tests of independence on two-way tables under cluster sampling: An evaluation. International Statistical Review, 64, 295-311.[Web of Science]
  • Thomas, DR, & Zumbo, BD. (1996). Using a measure of variable importance to investigate the standardization of discriminant coefficients. Journal of Educational and Behavioral Statistics, 21, 110-130.[Abstract/Free Full Text]
  • West, SG, Finch, JF, & Curran, PJ, Hoyle, RH (Ed.). (1995). Structural equation models with non-normal variables: Problems and remedies. Structural equation modeling: Concepts, issues and applications. Thousand Oaks, CA: Sage.

Journal of Educational and Behavioral Statistics, Vol. 32, No. 1, 61-91 (2007)
DOI: 10.3102/1076998606298037


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Thomas, D. R.
Right arrow Articles by Decady, Y. J.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

AER home page RER home page JEB home page EPA home page RRE home page