Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Journal of Educational and Behavioral Statistics
This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Jansen, M. G. H.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Articles

Testing for Local Dependence in Rasch’s Multiplicative Gamma Model for Speed Tests

Margo G. H. Jansen

University of Groningen, the Netherlands


    Abstract
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
The author considers a latent trait model for the response time on a (set of) pure speed test(s), the multiplicative gamma model (MGM), which is based on the assumption that the test response times are approximately gamma distributed, with known index parameters and scale parameters depending on subject ability and test difficulty parameters. Like any other parametric latent trait model, the MGM is based on strong assumptions. One of these assumptions is local independence. Two statistical tests for checking if the local independence assumption holds are compared, using generated and empirical data.

Key Words: speed tests • Rasch models • local independence • Lagrange multiplier tests

By an (itemized) test of pure speed, we mean a set of easy items that any member of the subject population could solve given sufficient time, administered under strict time-limit conditions. The test items are supposed to be of the same type and depend on a single latent trait only. This definition of a pure speed test goes back to Gulliksen (1950; Lord & Novick, 1968). Speed tests can also be administered without imposing a time limit, instructing the subjects to complete the entire test as rapidly as possible. Response processes on speed tests can be conceived as a stationary process with constant success probability, to be modeled as a series of observations of the values of identically distributed response variables (Lord & Novick, 1968, chap. 5). For such processes, the numbers of items solved in a fixed time interval or the average response time per item are variables that contain the same information with regard to the subject abilities. If one considers a test as a pure speed test, test scoring under time-limit constraints involves counting the number of items completed and, under unlimited time conditions, recording the time it takes to complete the test. Timing of individual-item latencies is usually not feasible, except when the tests are administered by computer.

Obviously, pure speed and pure power tests are only idealizations. In a test where speed and power aspects are present, item responses and item latencies are no longer equivalent and should therefore both be collected. Latent trait models for speed tests are rather rare, and models combining speed and power are even rarer. Verhelst, Verstralen, and Jansen (1997) formulated a latent trait model for time-limit tests that combined speed and power aspects. The model has a logistic form, derived from the assumptions of a gamma distribution for an item response time and a generalized extreme-value distribution for a latent response given the response time. Other examples can be found in work of White (1983), Thissen (1983), and Roskam (1997).

In educational measurement, the focus is usually on how well a respondent can perform on a task, and not on the time it takes to complete the task. One area, however, where speed is considered important is reading. Children learning to read are supposed to learn to read both correctly and fluently, and tests developed to measure reading performance, for example, single-word reading tests, tend to have scoring rules in which both aspects are scored separately, or in some combination. Reading tests are given in different formats. Oral reading tests are usually given in a continuous format, where the respondents are instructed to read a list of single words or a reading passage as rapidly as possible without making errors. If the responses cannot be registered electronically, the timing of individual ‘‘items’’ is usually not feasible.

Here, I will restrict myself to modeling response speed in the unlimited time condition. The Multiplicative Gamma Model for response times on (pure) speed tests (MGM) was originally designed to model reading speed (Rasch, 1960/1980). The MGM is a latent trait model for tests rather than items. Therefore, to be applicable, sets of at least two tests measuring the same trait are needed. The MGM can be used for solving some of the practical problems we encounter in achievement testing, such as test scoring and test calibration, in particular in the context of incomplete designs (Jansen, 1997a, 1997b; Jansen & Glas, 2001). The MGM is based on strong assumptions, and if these assumptions are not fulfilled, the validity of these uses is open to doubt. The availability of suitable model tests is therefore of prime importance. The aim of this article is to present procedures for the evaluation of model fit. In particular, I will describe tests for checking if the local independence assumption holds.


    The Model
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
Adopting the assumptions of a homogeneous Poisson process, Rasch (1960/1980) derived a latent trait model for simple time-limit tests. If the Poisson distribution holds for the number of items completed, than the interresponse times are exponentially distributed with scale parameter {lambda}, and the time t it takes to complete m items is gamma distributed with a known index parameter m and scale (rate) parameter {lambda}.


Formula(1)

The mean and the variance of the distribution in Equation 1 are m/{lambda} and m/{lambda}2.

Now, suppose that we have a set of K (speed) tests, measuring the same trait, possibly varying in length and rate parameters. For the rate parameters {lambda}, we will use Rasch’s multiplicative decomposition:


Formula(2)

where {theta}n refers to the ability and {varepsilon}j to the easiness of Test j. A higher value for {theta} corresponds to a shorter expected response time. The higher the value for {varepsilon}, the easier the test and the shorter the response times will be. The abilities are treated as a gamma-distributed random variable. The same approach can also be used for modeling a set of K items, instead of tests. In that case, we substitute a value of 1 for m in Equation 1, resulting in exponentially distributed response times.

The model presented here assumes that the number of items or units in the test is known. The index parameter m is supposed to be (exactly) equal to the number of items and interpreted as the test length. A slight modification consists of assuming that the index parameters are known, up to a common scaling factor {rho}, which has to be estimated. In some cases, the basic units of the test may be more or less arbitrary. In the examples discussed by Rasch (1960/1980) and Jansen (1997a), the tests were texts, and the basic units could be words but also sentences. The final inferences will depend on the choice of unit.


    Estimation
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
As has been mentioned in the previous section, {theta} is assumed to be gamma distributed with mean µ and index parameter {sigma}, implying a variance of µ2/{sigma}. For increasing {sigma}, the shape of the distribution becomes similar to the normal distribution. For smaller values of {sigma}, the distribution is skewed to the right. Now, if {xi}' = (µ, {sigma}, {varepsilon}) is the vector of population and test parameters, the log likelihood can be written as


Formula(3)

where tn stands for the vector of response times of Respondent n and T for the data matrix of all the respondents. To obtain the marginal maximum likelihood (MML) estimation equations, the first-order derivatives of Equation 3 with respect to {xi} have to be derived. By adopting an identity due to Louis (Glas, 1998), these can be written as


Formula(4)

with


Formula

It can be easily verified that


Formula(5)


Formula(6)


Formula(7)

where {psi}(.) refers to the psi or digamma function. So, the likelihood equations are given by


Formula(8)


Formula(9)


Formula(10)

If we consider only one subject population, we will need one suitable restriction on either the test parameters or the parameters of the subject distribution for identifiability.


    Assessing Model Fit
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
The model is based on strong assumptions, and if these assumptions are not fulfilled, misleading results may be the consequence. Jansen (1997b) described a number of statistical tests for assessing model fit. The distributional assumptions imply that the sum scores obtained by summing the weighted test response times (where the weights are the easiness parameters) follow a Pearson Type VI distribution. This can be checked by comparing the observed with the expected distribution. The power of this test is unknown, but possibly low. A second test will be described more fully in the next paragraph.

A potential source of misfit can be found in subgroup differences in large heterogeneous target populations. Differences in test parameters between groups of respondents, a phenomenon known as differential test functioning or DTF, form a serious threat to the generalizability of the test. In the case of DTF, we consider subpopulations defined on external observable variables. The same sort of reasoning, but now with an unobservable internal variable, can be followed to arrive at a second source of misfit. Namely, that in different segments of the ability scale the test response behavior is supposed to be properly described by the same test characteristic function. A third basic model assumption is local independence. The assumption of local independence (LI) in item response models is equivalent to the assumption that the latent trait under consideration spans the complete latent space (Lord & Novick, 1968, pp. 360–362). The assumption of a unidimensional latent trait implies, in our situation, that the response on a certain test is, given the trait value, independent of the responses on the other tests. Multidimensionality, therefore, is one of the most obvious sources of LI violation. Another situation leading to the violation of LI is when the response on a certain item/test is dependent on the response on preceding items/tests. In applications, we may encounter tests where some form of (item) clustering is present. If this is the case, LI is not likely to hold (Bradlow, Wainer, & Wang, 1999; Ip, 2001).

In the context of item response theory (IRT) models, several goodness-of-fit tests have been developed both in the framework of parametric and nonparametric item response modeling, and their behavior has been studied. In general, compared with model violations such as differential item or test functioning, the violation of LI has attracted less attention. In IRT modeling practice, the requirement of LI is often replaced by the less stringent requirement that the conditional covariance (between items) is zero (Junker, 1993). Especially in the context of nonparametric IRT, we find methods for local dependence assessment using statistics based on summing conditional covariances between the items (Douglas, Kim, Habing, & Gao, 1998; Stout, Froelich, & Gao, 2001). Chen and Thissen (1997) have evaluated several indexes for the detection of local dependence, among these the Q3 statistic, basically a correlation between residual item scores. Another useful approach, based on the principle of Lagrange multiplier tests, has been proposed by Glas (1998) and Glas and Verhelst (1995; see also Glas & Suárez Falcón, 2003). This approach has been used for developing a test for DTF for the MGM (Jansen & Glas, 2001). In the following paragraphs, I will describe the derivation of a test for assessing local dependence, using the Lagrange multiplier (LM) tests approach. The results of the LM test will be compared with a model test, using correlational methods, proposed by Jansen (1997b), but first, I will explain the principle of LM tests.

LM Tests
The principle of Lagrange Multiplier Tests has proved to be extremely useful in the context of deriving tests against model violations of IRT models such as the two-parameter logistic model, the nominal response model, and their generalizations (Glas, 1998). The basic idea is to test a special model parameterized as {Phi}1, against a generalization in which a basic assumption has been relaxed by adding parameters {Phi}2. The special model is derived from the general model, by setting {Phi}2 equal to zero. The LM statistic is defined as follows:


Formula(11)

where


Formula

and


Formula

where h({Phi}p) stands for the first-order derivatives with respect to the parameters {Phi}p. The statistic is approximately {chi}2 distributed with degrees of freedom equal to the number of parameters fixed. To calculate the LM test statistic, only the estimates of the parameters {Phi}1 of the restricted model are required. This is an obvious advantage because the estimation of the parameters under some alternative model can become quite complicated. More details can be found in Glas and Verhelst (1995).

The test outcome depends on the magnitude of the first-order derivatives of the log likelihood of the general model, evaluated at the point of the maximum likelihood estimates of {Phi}1 and the postulated values of {Phi}2. If the absolute values of the first-order derivatives are small, the fixed parameters will probably show little change if we set them free, so in other words, the values at which they are fixed are adequate. If the value is large, the test is significant, and the parameter values are likely to change if they are set free.

The framework of LM tests is also well suited for developing a statistical test for model violations in case of the speed test model (Jansen & Glas, 2001).

Tests for LI
To derive a LM test, I have to formulate a generalization of the model in Equations 1 through 3 where the basic assumption of LI has been relaxed by adding parameters to model dependencies between the tests.

Now, let us assume that for Test k the probability of observing score tnk is given by:


Formula(12)

The intensity parameter of the score distribution of Test k is augmented by a factor depending on the inverse of the score on Test j, modeling a positive association between the two scores if {delta}jk is positive. Modeling a negative association in this way is possible but problematic. Because the rate parameter of a gamma distribution has to be positive, negative values of {delta}jk are subject to arbitrary constraints (another possibility for a ‘‘negative’’ association can be sought in adding a factor depending on the score of Test j instead of the inverse). If LI applies, {delta}jk is equal to zero and Equation 1 returns. To derive the test statistic, the first-order derivative of the log likelihood with respect to {delta}jk is required. Substituting {delta}jk and taking expectations results in


Formula(13)

The interpretation of hn({delta}jk) is straightforward, namely, the difference between the observed score tnk and its expected value:


Formula

times a factor depending on the observed score tnj. Substituting Equation 13 in Equation 11 gives the desired test statistic. The test statistic, in the following referred to as LM test, is supposed to be approximately {chi}2 distributed with degrees of freedom equal to 1.

Another statistical test, similar to the residual correlation indices used in IRT, that might be used to assess the presence of local dependence was suggested by Jansen (1997b). It can be shown that the marginal likelihood can be written as a product of two separate parts, Lc({varepsilon})and Lp({varepsilon}, {sigma}, µ), respectively, using the following transformations:


Formula

and


Formula

The first part is the distribution of (un1, . . . , unk), conditional on tn*, the weighted total response time. This part of the likelihood, which is independent of the parameters of the ability distribution and involves the test parameters only, is defined as follows:


Formula(14)

The second part involves the distributon of tn*.

It can be shown that h(u1, . . . , uk|tn*) is a Dirichlet distribution with (known) parameters m1, . . . , mk. The model predicts a very specific pattern of correlations between the weighted relative response rates U, which suggests a model check in the form of comparing the observed correlations between pairs of Us with the predicted correlations. For each uj and ul the expected correlation is:


Formula(15)

In case of a moderate or small number of tests, this correlation will be substantial. A complication arises from the fact that both t* and the us are functions of the unknown test parameters {varepsilon}, which have to be estimated to calculate the correlations between pairs of us.


    Simulation Studies
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
Method
To study the properties of the LM and correlation statistics, we carried out a number of simulation studies.

In a first series of simulations, I used a set of five tests with test difficulties equal to .640, .800, 1.000, 1.250, and 1.563. A two-by-three design was used for test length and sample size. The test length was specified to be 15 or 25 items for each test. The respondents were randomly sampled from a gamma distribution with µ = 1 and index parameter {sigma} = 5 (implying a variance of .20 for the {theta}s).

In a second series of simulations, I generated scores on 10 short tests of fairly homogeneous difficulties (varying between .10 and .08). A two-by-three design was used for test length and sample size. The test length was specified to be 3 or 10 items for each test. The respondents were randomly sampled from a gamma distribution with µ = 1 and index parameter {sigma} = 9 (implying a variance of .11 for the {theta}s).

In a following series of simulations, I investigated the sensitivity of both statistics to violations of LI, using the same specifications for the test parameters and the subject parameter distribution as before.

The Type I Error Rates
The error rates are the proportions of significant outcomes of the test statistics averaged over tests (5 or 10) and replications (500). The ZCOR statistic was calculated by determining product-moment correlations between the estimated relative response times of pairs of tests and performing a Fisher z transformation. The predicted correlation for the set of five tests is –.25 for all pairs. For the set of 10 items, the predicted value is –.11. If the pairs of variables are bivariate normal distributed, the sampling distribution of the Fisher z–transformed correlations is asymptotically normal with a variance of 1/(N – 3). If the value of ZCOR falls outside the 95% confidence interval of the predicted (transformed) correlation, the null hypothesis of {delta} = 0 is rejected.

In the first half of Table 1, we find the error rates of the LM and ZCOR tests for the small set of tests and in the second half for the larger set. Each cell in the table is based on 500 replications. For both tests, the error rates are close to the nominal level. In case of the LM test, the rates are slightly higher than 5%, whereas the ZCOR test tends to be more conservative.


View this table:
[in this window]
[in a new window]

 
TABLE 1 Type I Errors of the LM and the ZCOR Test for Detecting Violations of Local Independence

 
Power Studies
In the following, I compare the power of the two tests, LM and ZCOR, for the detection of local dependence. Local dependence was introduced by specifying a nonzero value for an additional parameter {delta} modeling the dependence of the rate parameter of Test k on the response on Test j. In the simulations, the model violation was imposed by augmenting the subject speed parameters for Test k with a factor depending on the standardized response rate of a preceding Test j. For this, I rewrote the term ({theta}n {varepsilon}k + {delta}jk mj/tnj) in Equation 12 as follows:


Formula

where snj = mj/(tnj{varepsilon}k). The response rates snj were standardized by equating the mean and standard deviation to the theoretical mean and standard deviation of {theta}. The {delta}-values were specified as equal to 0.25 times the standard deviation of theta, for a small effect and .5 times the standard deviation of theta for a large one.

I will first present the results for the small set of long tests, where the model violation is imposed on the fourth test. The results of applying the LM and ZCOR test can be found in Tables 2 and 3. The hit rates show clear main effects of sample size and effect size. The hit rate is also larger for longer tests. In all cases, the power of the LM test is higher than the power of the ZCOR test. The false-alarm rates, averaged over the four other tests in the set of five, become larger for increasing sample sizes and larger effect sizes.


View this table:
[in this window]
[in a new window]

 
TABLE 2 A Comparison of the Rejection Rate of Local Independence With the LM and the ZCOR Test in the Set of Five Tests With One Model Violation With Effect Size 0.25

 

View this table:
[in this window]
[in a new window]

 
TABLE 3 A Comparison of the Rejection Rate of Local Independence With the LM and the ZCOR Test in the Set of Five Tests With One Model Violation With Effect Size 0.5

 
For the set of 10 tests, I have imposed a model violation on Test 4 and Test 6 (20% misfit) by imposing dependencies on the responses on the directly preceding tests. The results are shown in Tables 4 and 5. The hits are the rates of rejects averaged over Tests 4 and 6.


View this table:
[in this window]
[in a new window]

 
TABLE 4 A Comparison of the Rejection Rate of Local Independence With the LM and the ZCOR Test in the Set of 10 Tests With Two Model Violations With Effect Size 0.25

 

View this table:
[in this window]
[in a new window]

 
TABLE 5 A Comparison of the Rejection Rate of Local Independence With the LM and the ZCOR Test in the Set of 10 Tests With Two Model Violations With Effect Size 0.5

 
From the tables, it can be inferred that the effects of sample size, test length, and effect size show the same pattern as was found for the small set of five tests. Again, the hit rates increase with the sample size and the effect size. The LM test performs better than the ZCOR tests. The relatively small false-alarm rates are consistent with results by Glas and Suárez Falcón (2003), who studied, together with other fit statistics, a LM test for violation of LI in the framework of the Three-Parameter Logistic Model.


    An Empirical Example
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
The data in this example are taken from a study on the development of early reading and the factors relating to reading problems (van den Bos & Lutje Spelberg, 1997). The participants are children attending schools of special education. Among several other measurements, data were collected for four single-word reading tests. The tests were individually administered, and timing per item was not feasible. The first two tests, the EMT and the KLEPEL, were administered in their usual time-limit format. In addition, the time needed to finish the first 50 items was registered. The EMT and the KLEPEL are highly similar. Both consist of a list of stimulus words, ordered to increasing difficulty from one-syllable consonant vowel consonant (CVC) words to complicated three-syllable words. The difference between the EMT and KLEPEL is that whereas the stimulus words of the EMT are real words, those of the KLEPEL are pseudo-words. The third test, the AARON, consisted of short real words in no specific order. The fourth test consisted of blocks of five different color names (CLN) in random order. Both AARON and CLN were 50-item tests. The test stimuli differ in type as well as presentation, and it is possible that the four tests tap different abilities. Under these circumstances, local dependencies may arise when I analyze the data using a unidimensional model.

Results
The data were analyzed using the Rasch model with test lengths assumed to be known. The parameter estimates are presented in Table 6. The test parameter estimates show that the tests differ considerably in easiness. The abilities have an estimated mean of 1 and a variance of .18. After fitting the model, the correlations between the relative weighted response times were calculated (see Tables 6, 7, and 8).


View this table:
[in this window]
[in a new window]

 
TABLE 6 Parameter Estimates for the Four Single-Word Reading Tests Example

 

View this table:
[in this window]
[in a new window]

 
TABLE 7 Testing for the Local Independence of Four Single-Word Reading Tests Using the LM-Statistica

 

View this table:
[in this window]
[in a new window]

 
TABLE 8 Observed and Predicted Correlations Between the Relative Response Times of Four Single-Word Reading Tests

 
The observed correlations between the EMT and the KLEPEL, and the AARON and the CLN, were found to be positive, whereas the other correlations were (strongly) negative and therefore also not in accordance with the model predictions of r = –.333 for all six intercorrelations. The results of the LM test statistic, which is approximately {chi}2 distributed with one degree of freedom, also show strong indications for the violation of LI. Especially the CLN test shows large values for the LM test statistic. The pattern of correlations between the relative weighted responses suggests that the set of four falls apart into two pairs. However, removing only the CLN test resulted in a matrix of correlations for the remaining three tests, more in line with the predicted value, which is now –.50. All three are covered by the asymptotic 95% confidence interval. The LM tests still indicate a lack of fit.


    Discussion
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
Until recently, the availability of suitable statistical tests to assess the fit of the Rasch model for speed tests was rather limited. The principle of LM tests, which has been introduced by Glas and Verhelst (1995) as a guiding principle for deriving statistical tests in an IRT context, can also be applied to the MGM.

In this article, I focused on the performance of the LM test aimed at detecting local dependence. In a simulation study, the LM test statistic was found to be performing adequately. The model, if valid, predicts a specific pattern of correlations between the weighted relative test response rates. The observed correlations were also sensitive to violations of LI. However, in comparison, the LM statistics were more powerful than the correlational indices. The relatively small false-alarm rates are consistent with results reported by Glas and Suárez Falcón (2003) for a similar test in an IRT context.

Nonetheless, we must keep in mind that the simulation design that was used here, although inspired by empirical applications, covers a rather limited number of a large set of possible situations we may encounter in practice. Compared to other IRT models that model binary or polytomous item responses, the range of possible model specifications for continuous test response variables is much wider. For instance, tests may vary in length, difficulty, and number. More application studies, as well as simulation studies, are necessary.

More research is also needed to assess if the LM test aimed at detecting LI is also sensitive to other sources of lack of model fit, such as DTF, and violations of the equality of the test characteristic function assumption.


    Footnotes
 
MARGO G. H. JANSEN is an associate professor in the Department of Educational Sciences at the University of Groningen, Grote Rozenstraat 38, 9712 TJ, Groningen-NL;g.g.h.jansen{at}rug.nl. Her areas of specialization are test theory, in particular IRT, and applied statistics. Back

Manuscript received July 11, 2002. Revision received February 5, 2005. Accepted for publication August 24, 2005.


    References
 Top
 Abstract
 The Model
 Estimation
 Assessing Model Fit
 Simulation Studies
 An Empirical Example
 Discussion
 References
 
Bradlow, E, Wainer, H, & Wang, X. (1999). A Bayesian random effects model for test-lets. Psychometrika, 64, 153-168.[CrossRef][Web of Science]

Chen, WH, & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational Statistics, 22, 265-289.[CrossRef]

Douglas, J, Kim, HR, Habing, B, & Gao, F. (1998). Investigating local dependence with conditional covariance functions. Journal of Educational and Behavioral Statistics, 23, 129-151.[Abstract/Free Full Text]

Glas, CAW. (1998). Detection of differential item functioning using Lagrange Multiplier tests. Statistica Sinica, 3, 647-667.

Glas, CAW, & Suárez, Falcón JC. (2003). A comparison of item-fit statistics for the three-parameter logistic model. Applied Psychological Measurement, 27, 87-106.[Abstract/Free Full Text]

Glas, CAW, & Verhelst, ND. In Fischer, GH, & Molenaar, IW (Eds.). (1995). Testing the Rasch model. Rasch models: Foundations, recent developments and applications (p. 69-96). New York: Springer-Verlag.

Gulliksen, H. (1950). The reliability of speeded tests. Psychometrika, 15, 259-269.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]

Ip, EH. (2001). Testing for local dependency in dichotomous and polytomous item response models. Psychometrika, 66, 109-132.[CrossRef][Web of Science]

Jansen, MGH. (1997a). The Rasch model for speed tests and some simple extensions with applications to incomplete designs. Journal of Educational and Behavioral Statistics, 22, 125-140.[Abstract/Free Full Text]

Jansen, MGH. (1997b). Rasch’s model for reading speed with manifest explanatory variables. Psychometrika, 62, 393-409.[CrossRef][Web of Science]

Jansen, MGH, & Glas, CAW. In Boomsma, A, van, Duijn MAJ, & Snijders, TAB (Eds.). (2001). Statistical tests for differential test functioning in Rasch’s model for speed tests. Essays on item response theory (p. 149-162). New York: Springer-Verlag. (chap. 8).

Junker, BW. (1993). Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 3, 1359-1378.

Lord, FM, & Novick, MR. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. University of Chicago Press: Chicago, Original work published 1960.

Roskam, EE. In van der Linden, WJ, & Hambleton, RK (Eds.). (1997). Models for speed and time-limit tests. Handbook of modern item response theory. New York: Springer-Verlag.

Stout, W, Froelich, AG, & Gao, F. In Boomsma, A, van Duijn, MAJ, & Snijders, TAB (Eds.). (2001). Using resampling methods to produce an improved DIMTEST procedure. Essays on item response theory (p. 357-375). New York: Springer-Verlag. (chap. 19).

Thissen, D, Weiss, DJ (Ed.). (1983). Timed testing: An approach using item response theory. New horizons in testing: Latent trait theory and computerized adaptive testing (p. 179-203). New York: Academic Press.

van den Bos, KP, & Lutje Spelberg, HCL. In Leong, CK, & Joshi, LM (Eds.). (1997). Measuring word identification skills and related variables. Cross-language studies of learning to read and spell (p. 271-281). Dordrecht (the Netherlands), Boston, and London: Kluwer.

Verhelst, N, Verstralen, H, & Jansen, MGH. In van der Linden, WJ, & Hambleton, RK (Eds.). (1997). Models for time-limit tests. Handbook of modern item response theory (p. 169-185). New York: Springer-Verlag.

White, PO, Eysenck, HJ (Ed.). (1983). Some major components in general intelligence. A model for intelligence (p. 44-90). Berlin and New York: Springer-Verlag.

Journal of Educational and Behavioral Statistics, Vol. 32, No. 1, 24-38 (2007)
DOI: 10.3102/1076998606298032


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Free Full Text (Free PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Jansen, M. G. H.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

AER home page RER home page JEB home page EPA home page RRE home page