| Sign In to gain access to subscriptions and/or personal tools. |
Point Estimates and Confidence Intervals for Variable Importance in Multiple Linear RegressionCarleton University
Statistics Canada
The topic of variable importance in linear regression is reviewed, and a measure first justified theoretically by Pratt (1987) is examined in detail. Asymptotic variance estimates are used to construct individual and simultaneous confidence intervals for these importance measures. A simulation study of their coverage properties is reported, and an example is provided.
Key Words: Keywords: relative importance normalized Pratt measures asymptotic variances Bonferroni intervals simultaneous confidence intervals
The purpose of this article is to review and to present some new results on a particular measure of relative importance for multiple linear regression for the general case of p correlated explanatory variables. This measure defines the importance of the j-th independent variable as the product of two terms, namely its simple correlation with the dependent variable, The main contribution reported in this article is the development of individual and simultaneous confidence intervals for the p Pratt measures, which should increase their practical usefulness by providing an inferential, rather than merely descriptive, method of deciding whether one explanatory variable is more important than another. The overall goal of the article is thus similar, in a particular sense, to a recent article by Azen and Budescu (2003), who extended Budescus (1993) dominance approach by introducing new measures of dominance and by adding bootstrapped inference procedures to assess the stability of the dominance ranking. Azen, Budescu, and Reiser (2001) suggested another inferential approach to the assessment of variable importance in regression. These authors introduced the concept of the criticality of a variable, derived from the probability that an individual variable will be included in the best subset model fitted to a randomly drawn sample containing responses for one dependent and p explanatory variables. Probabilities were again assessed by means of bootstrap samples, with confidence intervals based on large sample multinomial theory. Although similar in that it provides inferential methods for assessing importance, the Pratt approach described in this article is very different conceptually from the dominance analysis and criticality approaches. Pratts derivation of his importance measure is axiomatic and theoretical, whereas the dominance and criticality approaches are based on an intuitive definition of variable importance. It should be noted that Pratts measure also has an intuitive definition as discussed by Bring (1996) and by Thomas et al. (1998). This interpretation of Pratts measures will be described later. The problem of assessing variable importance, particularly in multiple linear regression, has been extensively documented since the 1960s, and the articles cited above—in particular, those by Bring (1996), Budescu (1993), and Azen and Budescu (2003)—together provide an extensive review of the literature. In this article, therefore, only specific points that justify the need for importance measures such as Pratts (or such as dominance analysis and criticality) will be reviewed.
Kruskal and Majors (1989) carried out a broad survey of the scientific literature to develop a taxonomy of the various measures that have been used to quantify variable importance. In the context of regression analysis, they found that numerous measures of variable importance were in use, including simple and partial correlation coefficients, regression coefficients (standardized and unstandardized), t statistics, and p values. They also noted Pratts (1987) measure and recommended that it be studied further. To the above list can be added a variety of measures based on reduction in variance or on decompositions of variance explained, as documented by Azen and Budescu (2003). For the most part, the primary focus of the literature has been on measures of variable importance, rather than on basic concepts. Thus, the concept of relative importance has been termed ambiguous (Kruskal, 1987; Kruskal & Majors, 1989), not at all well defined (Healy, 1990), and vague and diffuse (Bring, 1994). In most applications and discussions of variable importance, the meaning of the concept is implied by the characteristics of the measure selected. Given this general confusion, Azen and Budescu (2003) argued in favor of an intuitive measure of importance that can be readily understood and interpreted by practitioners, and proposed their modified version of dominance analysis. Pratts approach (1987), on the other hand, was to define the concept of variable importance in terms of a number of axioms that should be satisfied by any measure of variable importance and to then deduce a unique measure satisfying them. Some of the problems associated with the traditional measures of variable importance can be illustrated by considering standardized regression coefficients, where the standardization is motivated by the difficulties inherent in interpreting unstandardized coefficients for explanatory variables that are measured on different scales (Achen, 1982; Greenland, Schlesselman, & Criqui, 1986; Healy, 1990). Although standardized coefficients are commonly used as measures of importance, Healy (1990) and Bring (1994) noted that because regression coefficients measure the conditional effect of a variable, standardization should therefore be based on conditional standard errors, rather than the customary unconditional standard errors. Bring (1994) went on to show that using conditionally standardized regression coefficients as measures of variable importance is equivalent to comparing individual t statistics. Unfortunately, a comparison of tj and tk, corresponding to variables xj and xk, provides an invalid assessment of relative importance because, for the full p variable model, each t statistic measures that variables contribution to the model over and above the effect of all (p – 1) other variables. Thus, the reference subsets for tj and tk are different making a direct comparison invalid (Azen & Budescu, 2003), a problem that is also associated with other traditional partial measures such as partial correlations, semipartial correlations, and so forth. In designing his dominance approach, Budescu (1993) avoided this problem by comparing the contribution of variables xj and xk in turn to a common subset of variables that omits both of them. Because t statistics for individual explanatory variables are related to increments in R2, Budescus original definition of dominance can be stated as follows: One variable dominates another if it generates a greater increment in R2 than the other, when added to all the equations formed from all possible subsets of the remaining (p – 2) variables.
Brings (1994) conclusion also leads directly to a consideration of importance measures based on partitions of R2, where a portion of the overall R2 is attributed to each predictor. If the independent variables are uncorrelated, there is a unique partition of R2 in which each variables contribution is equal to the variance it explains when it is individually regressed on the dependent variable. When the independent variables are correlated, however, it is difficult to find partitions of R2 where each component of variance explained can be logically assigned to an individual variable. If a prior ordering of the variables is available, a sequential orthogonalization of the independent variables leads to an assignable partition. On the other hand, the data-dependent ordering provided by sequential variable selection algorithms should not be used, because different orders of selection leading to the same final model can lead to dramatically different conclusions regarding the importance of a given variable (see, e.g., Stevens, 2002, p. 119). Several authors have attempted to overcome this problem by averaging various partitions of R2 over all possible orderings (for details, see Azen & Budescu, 2003), but this process results in measures that are hard to interpret. The advantage of Pratts (1987) approach is that it offers a very simple and theoretically justifiable way of partitioning R2 and assigning the components—namely, sample estimates of the
Pratt (1987) based his axiomatic development on a population model of the form
where the error u is uncorrelated with each of the xs, and is distributed with mean zero and variance 1. relative importance depends only on the means, variances, and correlations of y, x1, x2, . . . , xp, (i.e., only on first- and second-order moments); 2. relative importance is not affected by linear transformations of any variable.
He also standardized the (random) xs to have variance one, so that for the two variable case (p = 2), it follows from assumptions 1 and 2 that relative importance depends only on
Pratts (1987) basic approach was to use a symmetry argument to establish his importance measure for a two-variable regression and then to invoke additional axioms to generalize his measure to the p-variable case. He postulated that the two independent variables x1 and x2 could be constructed from a symmetric set of M variables, X1, . . . , XM, all having mean zero and variance s2, and all equally correlated with correlation coefficient r, such that the regression of y on these Xs was given by X1 + X2 + · · · + XM, with residual variance 3. the relative importance of x1 to x2 is as m to n.
Under the above setup, the standardized coefficients
where
To extend his rule given in Equation 3 to the case p > 2, Pratt first defined the importance of a subset of variables, (x1, . . . , xq), q < p, to be the sum of their individual importances. He then adopted two more axioms; namely, the extension of his second assumption to 4. the nonsingular linear transformation of a subset (x1, . . . , xq) into the subset (x1', . . . , xq') does not affect its importance relative to the other variables; and 5. the addition of a pure noise variable, independent of y and x1, . . . , xp, to a subset of variables does not affect the importance of the subset relative to the other variables. With these, together with one additional minor assumption, or axiom, for which several alternatives exist, Pratt established the validity of the rule of Equation 3 for assigning relative importance in the general case of p > 2 independent variables.
The Geometric Interpretation of Pratts Measure
where the
that is, as a sample estimate of Pratts measure, divided by the R2 corresponding to the full p-variable regression equation. The
Two Criticisms of Pratts Measure
The second criticism was advanced by Bring (1996), who geometrically illustrated the case of an independent variable, x1, that was orthogonal to y (i.e., uncorrelated with y) but which nevertheless increased R2 when it was added to a regression model originally containing only x2. Such variables are often referred to as suppressor variables (see, e.g., Stevens, 2002, p. 124): namely, variables that make no direct contribution to the regression sum of squares, contributing only by reducing (or suppressing) the overall error sum of squares (or equivalently, the error variance) through their relationship with the other independent variables. In the example, because To this point, the relationship of Pratts measure to other measures of variable importance has been described, and some recent criticisms have been put into context. The remainder of the article will address the construction and evaluation of confidence intervals.
The finite sample distributional properties of the normalized Pratt measures are technically complicated, even under the assumption of normal errors. The theoretical aspects of the current investigation are therefore based on an asymptotic analysis of the properties of the js, in the limit N ![]() , where N denotes sample size. A simulation study has also been carried out to determine how well the asymptotic confidence intervals perform for the sample sizes that might be encountered in practice. This section will present a summary of the asymptotic results; the results of the simulation study will be reported in a later section.
It will be assumed that the dependent and explanatory variables in Pratts model given by Equation 1 are independently drawn from a multivariate normal distribution; that is, (y, x1, . . . , xp)' ~N(µ,
and mean vector µ = (µy µ'x)' of dimension (p + 1) x1. Normality is a stronger assumption than that adopted by Pratt (1987), who assumed only that the disturbance term u and the independent variables x are uncorrelated. The normality assumption has been made to simplify the derivation of the asymptotic variances of the
where [.]j denotes the jth element of the vector. It is clear from Equation 7 that the means µy and µx can be set to zero with no loss of generality. The estimate
Consistency
where
Asymptotic Normality
where
where B = (b'
Variance Estimates
where tj is the value of the t statistic corresponding to
Individual and Simultaneous Confidence Intervals
Here Z
that is, the interval will cover the true value of the jth normalized Pratt importance measure with approximate probability (or confidence) 1 –
The above distinction between individual and simultaneous confidence intervals is often a source of confusion to practitioners. The inspection of an individual confidence interval will be appropriate if the importance of a specific variable is of prior interest—that is, the variable is being studied for reasons unconnected to the results of the data analysis. On the other hand, if a variable is of interest because its Pratt index is the largest, say, then inspection of an individual confidence interval for that variable would not be appropriate. Selection of the largest index implies a comparison with all other variables so that appropriate inference must be based on a set of intervals that accounts for variation in all Pratt indices simultaneously. This distinction will be further illustrated later.
The simulation experiment was primarily designed to explore the performance of the individual and simultaneous confidence interval procedures speci-fied by Equations 13 and 14. Because the confidence intervals depend on point estimates of the normalized Pratt measures, j, and on their variances, the biases in the estimates j and in their estimated variances were also examined. The simulation explored confidence interval performance for three variable and five variable regressions under a range of configurations of the js, for sample sizes ranging from 50 to 1,000. The reported results correspond to regressions with population R2 equal to .5 only. Other simulation results not reported here indicate that the conclusions of the study are not sensitive to variations in R2. The elements of the covariance matrix of Equation 6 were manipulated to yield the desired configurations of Pratt measures; in all cases (and without loss of generality), the diagonals of , namely, the variances of y and the independent xs, were set to 1. For three variables, six cases involving different configurations of normalized Pratt measures were studied, as follows:
Similarly structured configurations were examined for five explanatory variables, with the exception that cases E and F were combined for the five variable model. Values of the true normalized Pratt measures are shown in Table 1 for each case. In cases B, C, and D, for both three and five explanatory variables, all Pratt measures are positive and can therefore be meaningfully interpreted as measures of importance. In case E, there is one negative normalized measure of relatively small magnitude for both the three- and five-variable situations. As shown by Thomas et al. (1998), the normalized measures corresponding to the nonsuppressors still provide approximate measures of relative importance in such situations, although the recommended procedure is to partial out the contribution of the suppressor variable prior to assessing importance. This aspect of importance analysis has not been considered in the present study. Cases A (for three and five variables) and F (for three variables) all feature negative normalized Pratt measures of substantial magnitude. These measures cannot, therefore, be interpreted in terms of importance and represent cases in which confidence intervals could profitably be used to confirm the difficulty. The simulation experiment consisted of 500 independent replications of the following five-step process for the three- and five-variable cases in turn:
Based on the 500 replications, a number of summary measures were computed for the three- and five-variable models, namely:
Results are discussed in the following section.
Simulation Estimates of the Expected Values Values of the Ês( j)s, namely, the simulation estimates of the expected values of the js, are shown in Table 1 for all cases, for both three- and five-variable situations, and for sample sizes of 50 and 250. Asymptotic values are shown in bold type. It is clear from the table that the expected values of the normalized Pratt measures converge rapidly to their true (and asymptotic) values given by Equation 7 as sample size increases. Thus, the bias in the estimated normalized Pratt index of Equation 5 converges rapidly to zero with increasing sample size, and the estimator can be considered effectively unbiased for sample sizes of 250 or more. The correspondence between estimates and asymptotic values is close even for a sample size as low as 50.
Asymptotic Variances and True Variances
Simulation estimates of N times the true variances, denoted N x
Coverage Properties of the Individual Intervals The coverage properties of the individual confidence intervals given in Equation 13 are shown in columns 3 to 5 and 8 to 10 of Table 3, and in columns 3 to 7 of Table 4, for three explanatory variables and five explanatory tables, respectively. All configurations of the js are shown for sample sizes 50, 250, and 1,000. From Tables 3 and 4, the effect of sample size can clearly be seen. For a sample size of 50, the coverage in some cases is as low as 86%, at a nominal level of 95%. This is equivalent to having a test of hypothesis exhibit a true test level of 14% at a nominal test level of 5%. Such a low coverage (equivalent to a liberal test level) is not acceptable in practice. However, from the tables, it can be seen that for a sample size of 250, the actual coverage rates are within ±2% of the nominal level with only one exception. This is consistent with the assertion that the true coverage of the individual asymptotic intervals (for a sample size of 250) is close to 95%, given that the binomial confidence interval on an individual coverage estimate is ±2% for 500 replications when the true coverage level is 95%. There is no material difference in confidence interval performance for the three- and five-variable models, nor does there appear to be any material differences between the individual cases exhibited in Tables 3 and 4. This is an important finding as it would not be wise in practice to decide for or against using the confidence interval procedure based only on a sample estimate of the configuration of the js.
Coverage Properties of the Simultaneous Intervals The coverage properties of the simultaneous confidence intervals given in Equation 14 are shown in columns 6 and 11 of Table 3, and in column 8 of Table 4, for three explanatory variables and five explanatory tables, respectively. Results are again shown for all configurations of the js for sample sizes of 50, 250, and 1,000. The coverage results for the simultaneous intervals are generally similar to those for the individual intervals. It clear from Tables 3 and 4 that for a sample size of 50, coverage rates can be far too low: less than 80% in some cases at a nominal level of 95%. However, at a sample size of 250, simultaneous coverage rates are again close to the 95% ±2% range, which suggests a true coverage close to the nominal value. At a sample size of 1,000, coverage rates are even closer to the nominal level, as expected. Again, there is no material difference between coverage levels across the configurations of the js or between the three variable and five variable models.
Overall Summaries of Individual and Simultaneous Coverage
Figure 2 displays 11 coverage level estimates for each sample-size setting, with a corresponding Bonferroni confidence band with half-width 2.8%. Thus, Figure 2 confirms the primary conclusion drawn from Tables 3 and 4 for the simultaneous confidence intervals—namely, that the estimated simultaneous coverage rates are consistent with a true coverage level close to the nominal 95% whenever the sample size is 250 or greater. However, the results of Figure 2 show that simultaneous coverage rates at a sample size of 100 are markedly lower than for the individual coverage rates at the same sample size.
This is surprising at first sight because simultaneous Bonferroni confidence intervals based on exact statistics have a coverage level that is necessarily equal to or greater than 1 – In view of the above results and recommendations, it is important to note that a decision on whether to use individual or simultaneous intervals should not be based on the differences in coverage performance to be expected at a given sample size. As discussed in the text following Equation 14, individual intervals should only be used when there is some a priori interest in the importance of a particular variable. Whenever variable importances are to be examined and compared a posteriori, simultaneous intervals are required, in which case the minimum sample size recommendation of 250 should be respected. Should a practitioner choose to use the individual 95% intervals in a situation requiring simultaneous inference, then the simultaneous coverage would be approximately 85% for a three-variable regression and 75% for a five-variable case—levels that are too low for useful inference.
This example is based on a subset of data taken from a large study on the work-life conflict experienced by Canadian employees, documented in a report to Health Canada by Duxbury and Higgins (2003). To illustrate the use of the normalized Pratt measures and their corresponding confidence intervals, a regression analysis featuring one dependent variable and six explanatory variables was investigated. The dependent variable consisted of a work-family conflict measure based on a five-item scale: work-to-family interference (Duxbury & Higgins, 2003). The predictor variables consisted of a theoretically linked set of indicators of the culture of each respondents organization with respect to work-life balance. These individual item predictors were measured on a 5-point scale, and represented degrees of agreement or disagreement with the following six statements: ORG: Organization promotes environment supportive of work/personal balance LEAVE: Have seriously thought about switching to another organization COWSUP: Coworkers supportive of personal/family responsibilities LONG: Inability to work long hours would limit career ADVANCE: Family responsibilities hamper advancement NONO: Not acceptable to say no to more work
The sample size was 3,320, large enough to guarantee the applicability of the proposed asymptotic confidence interval procedures, and the regression R2 was .382. Given that this example features a 5-point measurement scale for the predictor variables, application of the proposed confidence interval procedures is a technical violation of the assumption of multivariate normality on which their derivation is based. Thus, it is implicitly assumed in the analysis of this specific example that the violation does not materially affect the conclusions. This is an important issue in general, given the importance of discrete Likert-type scales in behavioral research, and it will be discussed further below. Table 5 displays values of (a) the standardized regression coefficients; (b) the simple correlation between the dependent variable and each of the six explanatory variables; (c) the t statistics; (d) the partial correlation between the dependent variable and each of the explanatory variables, adjusted for the remaining five, denoted
The Pratt indices of Table 5 are given by the products of the first two columns divided by R2 = .382, as defined by Equation 5. However, although the standardized regression coefficients (first column) give the same ordering as the Pratt coefficients for this specific example, the orderings for variables LONG and ORG given by the magnitudes of the simple correlations (second column) and the Pratt indices are different. Furthermore, the simple correlations for LONG and ADVANCE are virtually identical, but their Pratt indices differ by more than a factor of two. The magnitudes of the t statistics and the partial correlations (shown for comparison) give identical orderings and are, in fact, directly proportional by definition. However, the orderings given by the magnitudes of these two statistics differ from those given by the Pratt indices, the latter indicating that LEAVE is more important than LONG, whereas the former two statistics suggest the reverse.
Quite apart from any similarities and differences in the orderings given by the above statistics, it should be recalled that the normalized Pratt measure is the unique measure of relative importance for the explanatory variables that satisfies Pratts axioms and definitions. Thus, from the It should also be noted that specific pairwise comparisons, between COWSUP and NONO, for example, could be sharpened by extending the analysis described in the appendix to the construction of simultaneous confidence intervals for all p(p – 1)/2 pairwise comparisons. This would require further simulation to determine the performance of these asymptotic intervals as a function of sample size, a project that has yet to be implemented.
Likely Effect of Discrete Predictors
The effect of discreteness and nonnormality on the distribution of the explanatory variables will be manifested through the contribution of the second term in Equation A.10 of the appendix, the variance of the conditional mean of the Pratt indices. The latter is given explicitly in Equation A.17, from which it can be seen that the effect will depend critically on the variances and covariances of the elements of X'X/N, the estimated variance matrix of the explanatory variables. So, in investigating the likely effects of discreteness and nonnormality on the variances of the Pratt indices, and hence on their confidence intervals, the closest parallel is the effect of nonnormality and discreteness of indicator variables in structural equation modeling (SEM). Classical versions of SEM are based on the assumption of multivariate normality of the indicators, and the question of discreteness and nonnormality of indicator variables has received considerable attention. Reviews are provided by Bollen (1989) and by West, Finch, and Curran (1995), and some general conclusions are that for confirmatory factor analysis and simple SEM models, (a) parameter estimates are relatively unaffected by discreteness provided the distributions are approximately normal (skewness and kurtosis values less than 1 in magnitude being cited); (b) standard errors are more sensitive to nonnormality and can be markedly underestimated when variables are highly (and differentially) skewed or when they exhibit large excess kurtosis; (c) when the number of categories has an effect, the effect is strongest when the number of categories is less than 5. The results of a study by Chou and Bentler (1995) are of particular relevance to the current example. They performed a simulation study of the performance of three different SEM estimators for a confirmatory factor analysis, using six different settings of nonnormality. Among other things, they examined the coverage properties of confidence intervals on the model parameters, and it is clear from their results that for their normal theory estimator, coverage rates equaled or exceeded their nominal level for the nonnormal cases featuring negative kurtosis. This is likely because of the fact that assuming multivariate normality for distributions featuring negative kurtosis leads to an overestimate of the variances and covariances of the elements of X'X/N. The results of Muthen and Kaplan (1992), based on discrete five-category variables, show a similar trend for the biases in parameter standard errors. It can be seen from Table 6 that the predictor variables in the work-life interference example exhibit negative kurtosis except for COWSUP, which exhibits positive but small kurtosis, and it can also be seen that the skewness values are all much less than 1 in magnitude. Thus, if we take these SEM results as a possible guide to the effects of discreteness and nonnormality on Pratt indices and their confidence intervals, the above inferences for the work-life conflict example should be reliable. More theoretical and empirical work on this issue is clearly required. The validity of the inferences for this specific example can also be investigated by regarding the discrete variables as observed versions of underlying multivariate normal variables that have been categorized with respect to a set of cutpoints (see, e.g., Bollen, 1989, p. 439). One effect of categorizing underlying variables is to reduce the magnitude of the Pearson correlation coefficients between them (see Krieg, 1999, and the references therein), an effect that will then generate bias in regression coefficients estimated using the categorized data. If the observed variables for the work-family life example are characterizing in terms of underlying normal variables, an indication of the effect of their discreteness can be obtained by (a) estimating the underlying correlations between the dependent and predictor variables, and among the predictor variables, by means of polyserial and polychoric correlations, respectively; (b) evaluating the Pratt indices and corresponding confidence intervals based on these underlying correlations. Note that these Pratt indices and confidence intervals, displayed in Figure 4, are not intended to provide an alternative analysis because the variances of the polyserial and polychoric correlations will not be accounted for. They are intended only to provide an indication of the sensitivity of the work-life interference example to variable discreteness and nonnormality. An initial indication of this sensitivity is given by the effect on the regression R2, which increases by 6% when the polyserial and polychoric correlations are used, from 38.2% to 40.4%. From Figure 4, it can be seen that the corresponding point estimates and confidence intervals are very similar to those displayed in Figure 3. Thus, the indications are that the effect of variable discreteness on the Pratt confidence intervals in the work-life conflict example is slight, and that the original inferences can be trusted.
The issue of variable importance in linear regression has been reviewed, and the importance measure justified theoretically by Pratt (1987) has been examined in detail. In this article, the utility of Pratts measure has been extended in two ways. First, approximate estimates of the variances of normalized Pratt measures corresponding to individual independent variables have been developed, based on an asymptotic analysis. Calculation of these estimates requires only information routinely printed in the output of standard regression programs. No new software is required. Simulation results have shown that the approximate variance estimate is suitably accurate for sample sizes frequently encountered in practice—namely, 250 or more and, in many cases, for sample sizes as low as 100. Second, these large sample variance estimates have been used to construct simultaneous confidence intervals (or interval estimates) for the normalized Pratt measures corresponding to the full set of independent variables. Again, these intervals can be easily calculated from standard regression software output. The results of the simulation have shown that these confidence intervals provide good coverage numerically close to the nominal 95% level for sample sizes of 250 or more. For smaller samples, the asymptotic confidence intervals tend to be liberal. However, for individual confidence intervals, appropriate when the importance of a specific variable is of prior interest unconnected to the results of the data analysis, this effect is not drastic even for samples as small as 100, in which case the actual confidence level exceeds 90% in most cases. For the simultaneous confidence intervals, however, the sample size requirement is not as flexible, and a minimum sample size of about 250 is recommended. As discussed in the review of the literature, Pratts (1987) measure is not the only viable measure of importance. Budescus (1993) dominance approach recently extended by Azen and Budescu (2003) and the criticality approach of Azen et al. (2001) comprise intuitively reasonable approaches to the issue, and they also provide inferential techniques for comparing their assessments across variables. However, because these inferences are based on bootstrapping methods, both approaches become heavily computer intensive and require special software. The methods developed in this article, featuring both point and interval estimates of the normalized Pratt measure of variable importance, are far easier to implement and require no special software. Thus, it is hoped that the results of this article will encourage researchers to examine the importance of their variables. To do so would be consistent with the recommendations of Kirk (1996) and others in favor of using effect sizes to assist with the interpretation of statistical analyses. To know that a regression coefficient is nonzero should only be the beginning of the interpretation. A full statistical assessment of the relative importance of the variable compared to the other independent variables should be regarded as an essential aspect of the analysis.
For a sample of size N, the linear model 1 can be written as
where 1N represents an N x1 vector of ones, y is an N x1 vector of observations of the dependent variable, X1 is an N x p matrix of observed explanatory variables, that is, X1 = (x11, x12, . . . x1p), and u is an N x1 vector of disturbances.
Consistency
where Q is a centering matrix defined as Q = (IN – 1N1N'/N) and IN is an identity matrix of dimension N. Because X1'Qy/N is a consistent estimator of
Asymptotic Normality
Then,
where Eux denotes expectation with respect to the joint distribution of u and X, Ex denotes expectation with respect to the marginal distribution of X, and Eu|x (to be used later) denotes expectation with respect to the conditional distribution of u. Consider next the centered and scaled version of pj/N, given by
It can be shown that as N
The first two terms on the right-hand side of Equation A6 comprise a linear combination of inner products given by
which are asymptotically normal with zero mean and variance to be determined later, given that the N random variables xjlul are independent and identically distributed, with zero mean, and provided that their second moments are finite. The latter condition is satisfied under normality, but the normality assumption is clearly not essential. A similar central limit theorem applies to the third term of Equation A6, which consists of the inner products
Provided the second moment of xjlxkl is finite, inner products of the form of Equation A8 will also be asymptotically normal. Therefore, pj/N is asymptotically normal since the right-hand side of Equation A6 consists of a linear combination of asymptotically normal terms. To investigate the asymptotic normality of the normalized Pratt index,
Given that pj/N is asymptotically normal,
Asymptotic Variance of
where the subscripts on the variances are defined equivalently to those for the expectations.
Conditional Asymptotic Variance
The conditional expectations in Equation A11 can be obtained from Equation A3, from which it follows that
Given that X'X/N consistently estimates
The corresponding expression for
Substitution of Equations A14 through A16 into Equation A11 leads to the terms in the curly brackets of Equation 10 for AV(
Variance of the Conditional Mean
Thus, it remains to evaluate the variance of the expression of Equation A17 under the normality assumption described in Equation 6 and the accompanying text. The linearized expression for the variance of a ratio given in Equation A11 again applies, but in this case, the individual variances and covariances involve fourth moments of the xjs. The advantage of the normality assumption is that fourth moments can be expressed as functions of variances and covariances, which greatly facilitates the derivation. Expressions for the required moments can be found in Kendal and Stuart (1977, p. 85). After some algebra, the required variance yields the final term in Equation 10, thus completing the derivation of the expression for the asymptotic variance of
D. ROLAND THOMAS is Professor, Sprott School of Business, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6;rthomas{at}sprott.carleton.ca. He is interested in a variety of modeling techniques of relevance to the business and social sciences and in the analysis of complex survey data.
PENGCHENG ZHU is a PhD student in the Sprott School of Business, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6;phil_zhu1979{at}yahoo.ca. He specializes in finance and related applications of statistics.
YVES J. DECADY, PhD, is an Analyst in the Labor Statistics Division, Statistics Canada, Jean Talon Building, Ottawa, Ontario, Canada, K1A 0T6;Yves.Decady{at}statcan.ca. His current area of research is labor dynamics.
The first authors work was supported by a grant from the Natural Sciences and Engineering Research Council of Canada and by grants from two Canadian research networks: namely, MITACS (Mathematics of Information Technology and Complex Systems) and NPCDS (National Program on Complex Data Structures). The authors wish to thank Professors Linda Duxbury and Christopher Higgins for permission to use their data to illustrate the proposed methodology. Manuscript received August 27, 2004. Accepted for publication July 13, 2005.
Achen, CH. (1982). Interpreting and using regression. Beverly Hills, CA: Sage.Azen, R, & Budescu, DV. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8, 129-148.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]Azen, R, Budescu, DV, & Reiser, B. (2001). Criticality of predictors in multiple regression. British Journal of Mathematical and Statistical Psychology, 54, 201-225.[CrossRef]Bollen, KA. (1989). Structural equations with latent variables. New York: John Wiley.Barndorff-Nielsen, OE, & Cox, DR. (1989). Asymptotic techniques for use in statistics. London: Chapman and Hall.Bring, J. (1994). How to standardize regression coefficients. American Statistician, 48, 209-213.[CrossRef]Bring, J. (1996). A geometric approach to compare variables in a regression model. American Statistician, 50, 57-62.[CrossRef]Budescu, DV. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114, 542-551.[CrossRef][Web of Science]Chou, C-P, & Bentler, PM, Hoyle, RH (Ed.). (1995). Estimates and tests in structural equation modeling. Structural equation modeling: Concepts, issues and applications. Thousand Oaks, CA: Sage.Darlington, RB. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69, 161-182.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]Duxbury, L, & Higgins, C. (2003). Work–life conflict in Canada in the new millennium: A status report. Health Canada. Retrieved February 1, 2004, from http://www.hc-sc.gc.ca/pphb-dgspsp/publicat/work-travail/report2/Greenland, S, Schlesselman, JJ, & Criqui, MH. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123, 203-208.
Journal of Educational and Behavioral Statistics, Vol. 32, No. 1,
61-91 (2007)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

j, and its standardized regression coefficient in the multiple regression,
j. The relative importance of two variables, xj and xk, is then defined as the ratio of their importance measures: namely, 
2. The first axioms that he adopted were 

, which in nongeometric terms is the vector of fitted values,
, is represented algebraically by the weighted vector sum 
js denote sample estimates of the standardized regression coefficients.
j of
onto
j) could be expressed as 
j, can be negative, which some practitioners interpret as negative importance, a meaningless concept. However, as noted in the text following Equation 2, Pratts importance rule is valid only when the population quantities 
, where N denotes sample size. A simulation study has also been carried out to determine how well the asymptotic confidence intervals perform for the sample sizes that might be encountered in practice. This section will present a summary of the asymptotic results; the results of the simulation study will be reported in a later section.
), with positive definite covariance matrix 

, estimated from a sample of N independent and identically distributed observations of y and x1, . . . , xp. Results of the asymptotic analyses are given below. Proofs are in the appendix. 
denotes convergence in probability. That is, when the sample size is large, the difference between the estimate
j will be small, with probability approaching 1. 
denotes convergence in distribution, and N(0, AV(
(
in place of b, and the estimated covariance matrix of the xjs, obtained as a submatrix of 
)% confidence interval (or interval estimate) for a specific normalized Pratt measure 















is also asymptotically normal, and the correlation between these terms can be evaluated from Equation A6. Because the means of the numerator and denominator of Equation A9 are nonzero, it follows by linearization that 



follows immediately. The conditional variances and the covariance term can be derived correct to O(N–1) from Equation A6. Under the assumption of normal and independent disturbances ul, the derivation is tedious but routine. The results are as follows: 







