Statistical modeling of non-metallic inclusions in steels and extreme value analysis

  • Statistische Modellierung von nichtmetallischen Einschlüssen in Stählen und Extremwertanalyse

Schmiedt, Anja Bettina; Kamps, Udo (Thesis advisor)

Aachen : Publikationsserver der RWTH Aachen University (2012, 2013)
Dissertation / PhD Thesis

Aachen, Techn. Hochsch., Diss., 2012


This doctoral thesis is motivated by the metallurgical problem of non-metallic inclusions, which arise unavoidably in the course of steel-making processes. Those inclusions are known to be a main reason for material defects, where the inclusion size is the most crucial geometrical parameter. Therefore, one is strongly interested in fitting an appropriate statistical model to non-metallic inclusion sizes and, as a key issue of quality engineering, in predicting large inclusion sizes. In order to collect data for statistical analyses, on a polished plane surface several control areas of same size are successively scanned by optical microscopy to detect those inclusions that intersect the surface. The sizes of the respective two-dimensional cross-sections are measured and stored, typically in terms of the square root of the projected area. A corresponding real data set has been provided by the Department of Ferrous Metallurgy of RWTH Aachen University. In metallography, so far, there has been the basic assumption that the ordered inclusion sizes within each control area are realizations of ordinary order statistics (oOS), leading to the application of classical extreme value analysis in order to predict large inclusion sizes. More precisely, extreme value theory has been applied in terms of the control area maxima method: observed control area maxima are fitted to a Gumbel distribution, the latter being assumed to be the appropriate extreme value distribution, and quantile estimates are calculated that serve (as an issue of prediction or rather extrapolation) as inclusion size estimates. From the statistical point of view, it seems reasonable to incorporate a wider range of extreme data into the statistical analysis than just control area maxima. Therefore, in this thesis it is dealt with multivariate extreme value (MEV) theory for oOS. The MEV method permits to estimate the distribution parameters of the generalized extreme value (GEV) distribution on the basis of the r > 1 largest observations of each control area, instead of just allowing for observed maxima. By carrying out a simulation study, it is shown that if estimation of the parameters of the GEV distribution is just based on single observed maxima, the true extreme value family is frequently miss-specified; in particular, the statistical analysis is considerably improved when being based on the respective r > 1 largest observations of each control area for sufficiently large values of r. These simulation results are illustrated by the respective MEV analysis of the real data set of inclusion sizes. By applying the MEV (or the control area maxima) approach, the ordered inclusion sizes within each control area are supposed to be realizations of oOS. However, in the real data set large inclusions appear with a significantly lower incidence than smaller ones, i.e., assuming the inclusion sizes of one control area to be in ascending order of magnitude, it holds in general, the larger the inclusions, the larger the difference between any two adjacent sizes. Hence, it might be possible that oOS are not always appropriate for modeling ordered inclusion sizes. On that account, in this thesis a more flexible model of ordered random variables is discussed, that is named generalized model of ordered inclusion sizes. It coincides with the model of generalized order statistics (gOS) in the distributional theoretical sense, and certain model parameters permit that the ordered inclusion sizes within one control area arise from parametrically adjusted hazard rates. Supposing that these model parameters are unknown, methods of statistical inference are discussed and applied to the real data set. Among others, statistical tests on the model parameters are carried out that indicate that oOS are not always appropriate for modeling inclusion sizes. Further, a link function approach is discussed to reduce the number of unknown parameters; in the analysis of the real data, it turns out to be appropriate to fit a log-linear one. Besides, by making use of the link function approach on the one hand and of extreme value theory for models of gOS on the other hand, the prediction of large inclusion sizes by means of extrapolation is addressed. Finally, research on extreme value theory for models of gOS is carried on. In a habilitation dissertation (Cramer 2003, University of Oldenburg) possible non-degenerate limit distributions for extreme gOS have been derived. In this doctoral thesis domains of attraction of those non-degenerate limit distributions are investigated, i.e., conditions on the underlying distribution function are established that are necessary and/or sufficient for extreme gOS to converge weakly to a non-degenerate limit distribution. Thereby, it turns out that many underlying distribution functions are attracted to a non-degenerate limit distribution function that is of the same type as the standard normal distribution function.