Multidimensional modeling and inference of dichotomous item response data

  • Mehrdimensionale Modellierung und Inferenz von dichotomen Item Response Daten

Kornely, Mia Johanna Katharina; Kateri, Maria (Thesis advisor); Moustaki, Irini (Thesis advisor)

Aachen : RWTH Aachen University (2021)
Dissertation / PhD Thesis

Dissertation, RWTH Aachen University, 2021


To analyze the fairness of an educational system of a country and to help with development of pedagogical concepts, questionnaire and test based surveys are important tools. An essential challenge in conducting such surveys is the measurement of not directly observable traits such as the ability of students in different subjects. These traits are modeled by latent variables. This thesis restricts on dichotomous items where the possible responses to each item can be categorized in a set of two options (e.g., "correct" and "incorrect") and on continuous latent variables. In item response theory (IRT) the probability of a correct response to an item depending on the latent variable is modeled. Multidimensional models suppose that there are several latent variables which are collected in a latent vector. Chapter 1 provides an overview of IRT models and methods for estimating model parameters and latent vectors. A particular emphasis lies on generalized linear latent variable models (GLLVM) and models that have a closed form expression of the marginal distribution of the response vector. Chapter 2 introduces an extension of GLLVM with respect to link functions and distributions of the latent vector that depend on parameters for their respective shapes. It is pointed out how this is connected to several models in the literature which are unified in this class. The consistency and asymptotic efficiency of the marginal maximum likelihood estimator (MMLE) for the model parameters is proved. This also implies that these asymptotic properties hold for many classic models, thus contributing to the estimation theory for IRT models in general. The asymptotic chi-square distribution of Wald, score and likelihood-ratio test-statistics is derived using the asymptotic efficiency of the MMLE. Model fitting, estimation of latent traits, nested model tests and model selection are studied in simulation studies. In Chapter 3 the asymptotic theory of estimating latent vectors is discussed. The estimation of latent vectors can be interpreted as (empirical) Bayesian point estimation with previous estimation of the (multidimensional) IRT model parameters. A primary target of this chapter is the investigation of variants of a Bernstein-von Mises theorem of latent vectors, i.e. the asymptotic posterior normality (APN) of latent vectors. This chapter provides a comprehensive analysis of questions related to Bernstein-von Mises theorems and the asymptotics of latent vector estimation for binary IRT. Current results regarding the asymptotics of the posterior of a single latent variable in the IRT literature are extended with respect to the multivariate case but also to the type of the convergence, the considered estimators and their asymptotic efficiency. In Chapter 4 a linear approximation of the expected a-posteriori estimator (aEAP) for latent vectors is obtained using the component statistics and the APN theory of Chapter 3. Properties of the aEAP are examined using a simulation study. A new EM-algorithm for MMLE of high dimensional logit models is derived using the APN theory once more and combining it with the aEAP. This EM-algorithm is easy to implement for any dimension of the latent vector by simplifying steps of similar adaptive algorithms for high dimensional settings. Chapter 5 focuses on parameter estimation for large high-dimensional IRT settings in which classic methods are unfeasible. Based on a pseudo likelihood procedure for a class of generalized IRT models that cannot always be interpreted as latent variable models, a method is obtained whose resulting fitted models are guaranteed to be equivalent to latent variable models. The implemented procedure is fast but the parameter estimates are biased. Bias and efficiency of the estimator are studied via simulations.