Association in contingency tables : an informationtheoretic approach
- Assoziation in Kontingenztabellen : ein informationstheoretischer Ansatz
Espendiller, Michael; Kateri, Maria (Thesis advisor); Kamps, Udo (Thesis advisor)
Dissertation / PhD Thesis
Dissertation, RWTH Aachen University, 2017
This Ph.D. thesis deals with one of the fundamental problems of categorical data analysis, namely that of measuring the association between categorical variables, cross-classified in a two-way table. Such tables occur in many scientific fields such as economics, social and biomedical sciences. Although a sensitive and more informative analysis is provided through adequate models, which constitute a basic and flexible tool, their implementation and interpretation often require advanced model fitting procedures and statistical software skills that can be too complex for practitioners. Association measures provide a convenient alternative offering a compact identification and overall quantification of underlying association. They are easy to understand and interpret. This thesis develops new association measures for 2 x 2 tables based on the phi-divergence by generalising the most fundamental measure of association, the odds ratio. The adopted approach is motivated by an extensive study on continuity corrections and confidence interval construction techniques, which are approaches for dealing with the problems caused by sampling zeros, i.e. cells with observed zero frequencies. Sampling zeros may lead to infinite estimates of the log-odds ratio and prohibit the use of asymptotic inferential methods due to infinite asymptotic variance estimates. The newly introduced measure, the phiscaled odds ratio, aims at solving these deficiencies by using a phi-divergence induced scale change. A scale change can improve the compatibility with sampling zeros and can -- in some set-ups -- lead to better Wald confidence intervals for the phi-scaled odds ratios with respect to their coverage probability and average relative length. A scalar measure can often be misleading in I x J tables when the association structure is more complex and cannot be described by a single parameter. The classical generalised odds ratios are naturally linked to parameters of association models. This close connection is used to construct new non-scalar measures of association. These measures are more informative since they inherit the increased sensibility of models and offer more options to cover association structures without losing the easy interpretability. Closed-form estimators for these model-based measures are introduced which are close to the maximum likelihood estimators, which have to be computed iteratively. A scale change can lead to more adequate measures. Therefore, this model-based approach is extended using the phi-divergence by providing and studying new generalised phi-scaled odds ratios for I x J tables. They are linked to a new phi-scaled association model, the generalised phi-linear model, and thus provide a phi-scaled extension of the modelbased measures for which closed-form estimators are also developed. I x I square tables with commensurable classification variables are of special interest, e.g. in social mobility studies to value the permeability of economical systems. Such tables can be analysed with symmetry models. The already existing phi-scaled symmetry models form the basis to develop a phiscaled asymmetry measure. Thus, a new family of directed asymmetry measures is introduced along with new phi-scaled versions of the standard symmetry tests of McNemar and Bowker. The main contribution of this work is the exploration and signalisation of the great flexibility of phi-divergence based categorical data measures, thus paving the way for further research, among others, on small-sized multi-way tables, which are naturally confronted with the presence of sampling zeros.