Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Journal of Educational and Behavioral Statistics
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Ferron, J. M.
Right arrow Articles by Hess, M. R.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Teacher's Corner

Estimation in SEM: A Concrete Example

John M. Ferron
Melinda R. Hess

University of South Florida


    Abstract
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
A concrete example is used to illustrate maximum likelihood estimation of a structural equation model with two unknown parameters. The fitting function is found for the example, as are the vector of first-order partial derivatives, the matrix of second-order partial derivatives, and the estimates obtained from each iteration of the Newton-Raphson algorithm. The goal is to provide a concrete illustration to help those learning structural equation modeling bridge the gap between the verbal descriptions of estimation procedures and the mathematical definition of these procedures provided in the technical literature.

Key Words: structural equation modeling • teaching statistics • maximum likelihood • Newton-Raphson • parameter estimation

Concrete examples are often used in the teaching and learning of mathematics and are advocated at virtually all levels of mathematics education, from K-12 instruction, as evidenced by the standards of the National Council of Teacher’s of Mathematics (National Council of Teachers of Mathematics, 1989) to acquisition of higher level mathematics skills. One can easily extend the reasoning behind the need to provide students with concrete examples and representations beyond the purely mathematical realm into the world of statistics. For basic statistical concepts, like the standard deviation, worked-out examples using a definitional formula are provided in most introductory texts, and additional examples can be readily generated. For more advanced analyses, like multiple regression, canonical correlation, and multivariate analysis of variance, worked-out examples can be found in Pedhazur (1997), Tatsuoka and Lohnes (1988), and Stevens (2002), respectively. As statistical analyses get more complicated, the availability of concrete examples diminishes.

In the area of structural equation modeling, it is relatively difficult to find or generate concrete examples illustrating parameter estimation algorithms. Such examples, however, are valuable in helping learners bridge the gap between the verbal descriptions of estimation procedures and the mathematical definition of these procedures provided in the technical literature. We have found the example illustrating maximum likelihood estimation of a single parameter model given by Bollen (1989) to be useful to learners. Our purpose is to provide a more complex example for learners who benefit from concrete illustrations. We have chosen a two-parameter model for the example. Having multiple parameters enables us to maintain vectors and matrices in the computations, whereas limiting our model to two parameters keeps the computations relatively simple. Maximum likelihood estimation is illustrated because this is the method most commonly used to estimate structural equation models and the first method learners typically try to understand.


    Context for the Example
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
In this example, we consider a single latent exogenous variable, {xi}, being used to predict a single latent endogenous variable, {eta}, both of which have single-indicator variables (X and Y, respectively). The model is shown in a path diagram in Figure 1, where ovals indicate the latent variables of interest, rectangles indicate observed variables used as indicators of the latent variables, circles represent errors, single-headed arrows represent effects, and double-headed arrows originating and ending on the same variable indicate variances.


Figure 10320110
View larger version (9K):
[in this window]
[in a new window]

 
FIGURE 1 Path diagram for computational example.

 
For illustrative purposes, we have assumed known values for five of the model parameters and unknown values for the other two. The measurement error ({delta}) of indicator variable X is assumed to have a variance of 2, and the measurement error ({varepsilon}) of indicator variable Y is assumed to have a variance of 4. Other parameters that we have assumed a value for include the variance of latent variable {xi} to be 8 and the coefficients from both latent variables {xi} and {eta} to their single indicators variables, X and Y respectively, to be 1. The two parameters that we are estimating in this example are the coefficient from latent variable {xi} to latent variable {eta}, represented by gamma ({gamma}) and the variance of the disturbance of the endogenous latent variable {eta}, represented by psi ({psi}).

This model can be mathematically represented by a series of matrices. Using the eight-matrix notation that over the years has been made familiar by the LISREL (Linear Structural RELationships) software package (Jöreskog & Sörbom, 2004), the model parameters can be represented by the following series of matrices:


Formula

where {Lambda}x contains the coefficient from {xi} to its indicator variable X, {Lambda}y contains the coefficient from {eta} to its indicator variable Y, {theta}{delta} contains the variance in the measurement error of X, {theta}{varepsilon} contains the variance in the measurement error of Y, {Phi} contains the variance in {xi}, beta becomes zero because this model has no effects between latent endogenous variables, {Psi} contains the variance in the disturbance of {eta}, and {Gamma} contains the coefficient from {xi} to {eta}.


    Implied Covariance Matrix
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
If a structural equation model was properly specified, and if we knew all the parameter values, we could compute the model’s implication for the covariance between all pairs of variables. Put another way, the model and parameter values, {theta}, imply a covariance matrix, {sum}({theta}). The general form of the implied covariance matrix for structural equation modeling has been discussed (Hayduk, 1987) and derived (Bollen, 1989) elsewhere, and is given by


Formula(1)

where I is the identity matrix. By substituting the parameters from our example, this general equation can be simplified, yielding


Formula(2)


    Observed Covariance Matrix
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
Assume the observed covariance matrix is


Formula(3)

where 20 is the observed variance of X, 10 is the observed variance of Y, and 5 is the observed covariance between X and Y.


    Maximum Likelihood Fitting Function
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
When we estimate parameter values for a structural equation model, we are trying to pick values for the parameters that make the implied covariance matrix as close as possible to the observed covariance matrix, or in other words, we try to minimize a discrepancy function between the model implied covariance matrix and the observed covariance matrix. There are a variety of different ways to define the discrepancy function, and thus alternative ways to estimate the parameters. The most common estimation method, maximum likelihood, is based on a normality assumption that allows the likelihood of the observed covariance matrix to be considered for a given set of parameter values. Different sets of values are considered, and then the set that maximizes the likelihood of the observed covariance matrix is selected. It can be shown that the values that maximize the likelihood of the data are the values that minimize the maximum likelihood fitting function,


Formula(4)

where p + q is the number of variables, S is the observed covariance matrix, and {sum}({theta}) is the implied covariance matrix. Bollen (1989) provides detailed derivation of the likelihood function and shows its relationship to the fitting function. The fitting function can be simplified for our example by substituting our implied and observed covariance matrices,


Formula(5)

This equation can be simplified to Equation 6 and further simplified to Equation 7 through matrix algebra (see Hayduk, 1987, or Bollen, 1989, for details about inverses, determinants, traces, and multiplication of matrices in the context of structural equation modeling). The minimization process is unaffected by the constants in the function [log(175) and 2], thus these could be dropped from the equation.


Formula(6)


Formula(7)


    Minimizing the Fitting Function
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
Numerical methods are used to minimize the fitting function. The algorithm begins with an initial guess for each of the parameters to be estimated. These guesses become the start values, which are then adjusted by considering how the fitting function is changing at these particular values. The adjusted values are then considered and also adjusted. The iterative process of making adjustments continues until it appears that the minimum of the fitting function has been reached. There are a variety of algorithms for minimizing the fitting function (e.g., Newton-Raphson, Fletcher-Powell, Gauss-Newton, Levenberg-Marquardt). The algorithms differ from each other in the details of how the adjustments, or steps, in the process are defined. We will illustrate the Newton-Raphson algorithm because it is relatively straightforward, but complex enough to facilitate thinking about the first- and second-order partial derivatives of the fitting function. A step in the Newton-Raphson algorithm is generally defined by


Formula(8)

where the vector of parameter estimates at iteration i + 1, Formula(i+1), involves an adjustment of the parameter estimates from the previous iteration, Formula(i). The size of the adjustment depends on the gradient, Formula, and the Hessian matrix, Formula.

The gradient contains the partial slopes for the surface of the fitting function. As we get closer to the minimum, these slopes approach zero. The Hessian matrix contains information about the rate the partial slopes are changing. Using the gradient and inverse Hessian to adjust the estimates allow relatively large adjustments to be made when we are far from the minimum, and finer and finer adjustments to be made as we get closer to the minimum.

To illustrate the use of the Newton-Raphson algorithm in this example, we must first find the first-order and second-order partial derivatives of the fitting function. In the next section, we will find the first-order partial derivatives of the fitting function for our example, and in the following section, we will find the second-order partial derivatives. Once the partial derivatives have been found, we will return to the Newton-Raphson algorithm and illustrate its application to our example.


    Vector of First-Order Partial Derivatives of FML
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
An element of the vector of first-order partial derivatives is found by finding the derivative of the fitting function with respect to a particular parameter. For our example, the gradient, or vector of first-order partial derivatives, is


Formula(9)

The process of finding the first-order partial derivative of the fitting function with respect to {gamma} for our example is illustrated in Equations 10 through 12. Equation 10 shows that we intend to take the derivative of the fitting function. Equation 11 shows the result of taking the derivative with respect to{gamma}, and Equation 12 shows the result after simplifying the expression.


Formula(10)


Formula(11)


Formula(12)

We then find the first-order partial derivative of the fitting function with respect to {psi} for our example, which is illustrated in Equations 13, 14, and 15.


Formula(13)


Formula(14)


Formula(15)


    Matrix of Second-Order Partial Derivatives of FML
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
The matrix of second-order partial derivatives of FML, the Hessian matrix, for our example is given by


Formula(16)

Note that each element of the Hessian matrix is found by taking the derivative of one of the first-order partial derivatives with respect to one of the parameters.

The first element of the first row of this matrix is obtained by taking the derivative of Formula with respect to {gamma}. This process is illustrated starting with Equation 17, which indicates that we will be taking the derivative with respect to {gamma} of the result from Equation 12. The result of taking this partial derivative is shown in Equation 18 and then simplified to obtain Equation 19.


Formula(17)


Formula(18)


Formula(19)

The second element of the second row is obtained by taking the derivative of Formula with respect to {psi}. This process is shown starting with Equation 20, which relies on the result from Equation 15, then proceeds by finding the partial derivative (Equation 21), and concludes by simplifying (Equation 22).


Formula(20)


Formula(21)


Formula(22)

The off-diagonal elements can be found by either taking the derivative of Formula with respect to {gamma} or the derivative of Formula with respect to {psi}. The process of taking the derivative of Formula with respect to {gamma} is shown in Equations 23 through Equation 25.


Formula(23)


Formula(24)


Formula(25)


    Results of Each Iteration
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
Now that we have solved for the first-order and second-order partial derivatives, we can illustrate the use of the Newton-Raphson algorithm for our example, where each step is defined by


Formula(26)

To start the estimation process, we need some initial estimates for {gamma} and {psi}. There are alternative ways of doing this, ranging from subjective estimates based on the researcher’s understanding of the model to noniterative estimation approaches like the instrumental variables method (e.g., Bollen, 1989). For simplicity, we chose to provide subjective estimates for the start values, Formula(1) = .5 and Formula(1) = 5. By substituting these values into Formula(i), Formula(i), Formula, Formula, Formula, Formula, and Formula and solving, Equation 26 becomes


Formula(27)

which leads to values of 0.6101465 for Formula(2) and 7.7471531 for Formula(2). These estimates are then substituted into Formula(i), Formula(i), Formula, Formula, Formula, Formula, and Formula in Equation 26, which leads to


Formula(28)

The estimates are now Formula(3) = 0.6229589 and Formula(3) = 10.53068.

To conserve space, the results of the iterations are summarized in Table 1. In this table, we list the estimates of {gamma} and {psi}, the elements of the gradient, and the value of the maximum likelihood fitting function for each iteration. Note that the changes in {gamma} and {psi} are relatively large during the first few iterations and get smaller as the iterations continue. Also notice the elements of the gradient and the maximum likelihood fitting function are getting smaller as the iterations increase. For this example, the elements of the gradient become zero on the seventh iteration. In addition, the value for the fitting function becomes zero, which implies the estimates of 0.625 for {gamma} and 12.875 for {psi} lead to an implied covariance matrix, {sum}({theta}), that does not differ from the sample covariance matrix, S.


View this table:
[in this window]
[in a new window]

 
TABLE 1 Summary of the Results of Each Iteration of the Maximum Likelihood Estimation

 

    Alternative Estimation Methods
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
For this small example, we could have arrived at estimates for {gamma} and {psi}, by equating the implied covariance matrix to the sample covariance matrix,


Formula(29)

Separate equations can be written for each element, leading to two equations containing two unknowns,


Formula(30)


Formula(31)

Solving Equation 30 for {gamma} leads to 0.625, and substituting 0.625 for {gamma} in Equation 31 and solving for {psi} gives 12.875.

In more complex situations, there are typically more equations than unknowns. Attempts to solve the equations for the unknowns will often lead to multiple estimates for the same unknown because sampling error keeps the elements of S from being identical to the elements of {sum}({theta}). It is these situations that motivate more complex estimation methods. Different methods (e.g., maximum likelihood, weighted least squares) are based on different assumptions (e.g., maximum likelihood assumes multivariate normality) and will often lead to different estimates, so care should be taken to choose an estimation method appropriate for the application. Finally, it should be noted that iterative methods will often not reach a fitting function value of zero. We say that we have met the convergence criterion when all elements of the gradient become less than some set value (e.g., .000001), which implies the estimates are changing trivially from one iteration to the next.


    Closing Comments
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
For the purpose of illustrating maximum likelihood estimation in the context of structural equation modeling it was useful to focus on a relatively simple model. As models and data become more complex, researchers should be aware that estimation does not always proceed as smoothly as it did with this example. If estimation problems are encountered, it may be useful to consider the following strategies: (a) changing the start values, (b) changing the estimation algorithm, (c) increasing the maximum number of iterations, (d) screening the data for outliers and other anomalies, (e) using boundary constraints, (f) rescaling any variable with a variance many orders of magnitude larger than the variances of the other variables, (g) checking the identification status of the model, (h) checking the consistency between the model specified in the structural equation modeling software and the path diagram, and (i) reexamining the consistency between the path diagram and theoretical expectations. For details on the problems that can be encountered, potential solutions, and limitations to these potential solutions, see Chen, Bollen, Paxton, Curran, and Kirby (2001); Wothke (1993); and Bentler and Chou (1987).


    Footnotes
 
JOHN M. FERRON is Professor of Educational Measurement and Research at the University of South Florida, 4202 East Fowler Ave. EDU 162, Tampa, FL 33620;ferron{at}tempest.coedu.usf.edu. His areas of interest include structural equation modeling, multilevel modeling, and the analysis of interrupted time-series data. Back

MELINDA R. HESS is Director of the Center for Research, Evaluation, Assessment, and Measurement at the University of South Florida, 4202 East Fowler Ave. EDU 162, Tampa, FL 33620;mhess{at}tempest.coedu.usf.edu. Her areas of interest include multilevel modeling, structural equation modeling, research reporting protocols and practices, and program assessment. Back

Manuscript received December 1, 2004. Accepted for publication November 21, 2005.


    References
 Top
 Abstract
 Context for the Example
 Implied Covariance Matrix
 Observed Covariance Matrix
 Maximum Likelihood Fitting...
 Minimizing the Fitting Function
 Vector of First-Order Partial...
 Matrix of Second-Order Partial...
 Results of Each Iteration
 Alternative Estimation Methods
 Closing Comments
 References
 
Bentler, PM, & Chou, CP. (1987). Practical issues in structural equation modeling. Sociological Methods and Research, 16, 78-117.[CrossRef]

Bollen, KA. (1989). Structural equation modeling with latent variables. New York: John Wiley.

Chen, F, Bollen, KA, Paxton, P, Curran, PJ, & Kirby, JB. (2001). Improper solutions in structural equation models—Causes, consequences, and strategies. Sociological Methods and Research, 29, 468-508.[CrossRef]

Hayduk, L. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore: John Hopkins University Press.

Jöreskog, KG, & Sörbom, D. (2004). LISREL (Version 8.7) [Computer Software]. Lincolnwood, IL: Scientific Software International.

National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author.

Pedhazur, EJ. (1997). Multiple regression in behavioral research: Explanation and prediction (3.). Fort Worth, TX: Harcourt Brace.

Stevens, JP. (2002). Applied multivariate statistics for the social sciences (4.). Mahwah, NJ: Lawrence Erlbaum.

Tatsuoka, MM, & Lohnes, PR. (1988). Multivariate analysis: Techniques for educational and psychological research (2.). New York: Macmillan.

Wothke, W. In Bollen, KA, & Long, JS (Eds.). (1993). Nonpositive definite matrices in structural equation modeling. Testing structural equation models (p. 256-293). Newbury Park, CA: Sage. (chap. 11).

Journal of Educational and Behavioral Statistics, Vol. 32, No. 1, 110-120 (2007)
DOI: 10.3102/1076998606298025


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Ferron, J. M.
Right arrow Articles by Hess, M. R.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

AER home page RER home page JEB home page EPA home page RRE home page