Workshop: Current Trends in Computational Statistics
Computational Statistics ist ein sich schnell entwickelnder Forschungsbereich, der sich mit dem Entwurf und der Untersuchung computerunterstützter statistischer Methoden befasst, insbesondere für große Datensätze. In diesem Workshop sollen aktuelle Trends und Entwicklungen der jüngsten Vergangenheit im Bereich Computational Statistics betrachtet werden. Der Workshop findet in hybrider Form statt; einige Vortragende sind per Zoom zugeschaltet.
Ort:
Raum 008/SeMath, Pontdriesch 14-16, 52062 Aachen
Zeit: Donnerstag und Freitag 12./13. Januar 2023
Für eine Teilnahme in Präsenz oder virtuell wird um kurze Registrierung bei der Referentin der Fachgruppe Mathematik, Frau Breuer, gebeten unter breuer@mathematik.rwth-aachen.de.
Ein Zoom Link wird an registrierte Teilnehmende verschickt.
Programm
Donnerstag 12. Januar 2023
-
14:30
Ankunft - 14:50
Beginn
-
15:00 - 16:00 • Zoom-Vortrag
Max-linear Graphical Models for Extreme Risk Modelling
Claudia Klüppelberg (TU München)
Graphical models can represent multivariate distributions in an intuitive way and, hence, facilitate statistical analysis of high-dimensional data. Such models are usually modular so that high-dimensional distributions can be described and handled by careful combination of lower dimensional factors. Furthermore, graphs are natural data structures for algorithmic treatment. Moreover, graphical models can allow for causal interpretation, often provided through a recursive system on a directed acyclic graph (DAG) and the max-linear Bayesian network we introduced in [1] is a specific example. This talk contributes to the recently emerged topic of graphical models for extremes, in particular to max-linear Bayesian networks, which are max-linear graphical models on DAGs.
In this context, the Latent River Problem has emerged as a flagship problem for causal discovery in extreme value statistics. In [2] we provide a simple and efficient algorithm QTree to solve the Latent River Problem. QTree returns a directed graph and achieves almost perfect recovery on the Upper Danube, the existing benchmark dataset, as well as on new data from the Lower Colorado River in Texas. It uses pariwise dependence, handles missing data, and has an automated parameter tuning procedure. In our paper, we also show that, under a max-linear Bayesian network model for extreme values with propagating noise, the QTree algorithm returns asymptotically a.s. the correct tree. Here we use the fact that the non-noisy model has a left-sided atom for every bivariate marginal distribution, when there is a directed edge between the the nodes.
[1] N. Gissibl and C. Klüppelberg. Max-linear models on directed acyclic graphs. Bernoulli 24 (2018), no. 4A, 2693-2720.
[2] Ngoc M. T., J. Buck, and C. Klüppelberg. Estimating a latent tree for extremes. Submitted (2021).
Link to the paper: https://arxiv.org/abs/2102.06197
-
16:00 - 17:00 • Zoom-Vortrag
Optimal and computationally efficient ranking in crowd-sourcing
Alexandra Carpentier (University of Postdam)
Consider a crowd sourcing problem where we have n experts and d tasks. The average ability of each expert for each task is stored in an unknown matrix M, which is only observed in noise and incompletely. We make no (semi) parametric assumptions, but assume that both experts and tasks can be perfectly ranked: so that if an expert is better than another, she performs on average better on all tasks than the other - and that the same holds for the tasks. This implies that if the matrix M is permuted so that the experts and tasks are perfectly ranked, then the permuted matrix M is bi-isotonic.
We focus on the problem of recovering the optimal ranking of the experts in l_2 norm, when the questions are perfectly ranked. We provide a minimax-optimal and computationally feasible method for this problem, based on hierarchical clustering, PCA, and exchange of informations among the clusters. We prove in particular - in the case where d > n - that the problem of esimating the expert ranking is significantly easier than the problem of estimating the matrix M.
This talk is based on joint work with Emmanuel Pilliat and Nicolas Verzelen.
-
17:00 - 18:00
Diskussion (Präsenz und Zoom)
Current Trends and Recent Developments in Computational Statistics
Abendessen
Freitag 13. Januar 2023
-
9:00 - 10:00
Anomaly detection for a large number of streams: a permutation/rank-based higher criticism approach
Rui Pires da Silva Castro (TU Eindhoven)
Anomaly detection when observing a large number of data streams is essential in a variety of applications, ranging from epidemiological studies to monitoring of complex systems. High-dimensional scenarios are usually tackled with scan-statistics and related methods, requiring stringent modeling assumptions for proper test calibration. In this talk we take a non-parametric stance, and introduce two variants of the higher criticism test that do not require knowledge of the null distribution for proper calibration. In the first variant we calibrate the test by permutation, while in the second variant we use a rank-based approach. Both methodologies result in exact tests in finite samples. Our permutation methodology is applicable when observations within null streams are independent and identically distributed, and we show this methodology is asymptotically optimal in the wide class of exponential models. Our rank-based methodology is more flexible, and only requires observations within null streams to be independent. We provide an asymptotic characterization of the power of the test in terms of the probability of mis-ranking null observations, showing that the asymptotic power loss (relative to an oracle test) is minimal for many common models. As the proposed statistics do not rely on asymptotic approximations, they typically perform better than popular variants of higher criticism relying on such approximations. Finally, we demonstrate the use of these methodologies when monitoring the content uniformity of an active ingredient for a batch-produced drug product, and monitoring the daily number of COVID-19 cases in the Netherlands. (based on joint works with Ivo Stoepker, Ery Arias-Castro and Edwin van de den Heuvel)
Kaffee/Teepause
-
10:30 - 11:30
Fast and fair simultaneous confidence bands for functional parameters
Dominik Liebl (University of Bonn)
Quantifying uncertainty using confidence regions is a central goal of statistical inference. Despite this, methodologies for confidence bands in Functional Data Analysis are still underdeveloped compared to estimation and hypothesis testing. In this work, we present a new methodology for constructing simultaneous confidence bands for functional parameter estimates. Our bands possess a number of positive qualities: (1) they are not based on resampling and thus are fast to compute, (2) they are constructed under the fairness constraint of balanced false positive rates across partitions of the bands' domain which facilitates the typical global, but also novel local interpretations, and (3) they do not require an estimate of the full covariance function and thus can be used in the case of fragmentary functional data. Simulations show the excellent finite-sample behavior of our bands in comparison to existing alternatives. The practical use of our bands is demonstrated in two case studies on sports biomechanics and fragmentary growth curves. (Joint work with Matthew Reimherr, Penn State University.)
Link to the paper: https://arxiv.org/abs/1910.00131
-
11:30 - 12:30
From High-Dimensional Statistics To Deep Learning
Johannes Lederer (University of Bochum)
Sparsity is popular in statistics and machine learning because it can avoid overfitting, speed up computations, and facilitate interpretations. In deep learning, however, the full potential of sparsity still needs to be explored. This presentation first recaps sparsity in the framework of high-dimensional statistics and then introduces corresponding methods and theories for modern deep-learning pipelines. More generally, this presentation gives rare mathematical insights into an otherwise extremely active field of research.
Mittagessen
-
14:00 - 15:00 • Zoom-Vortrag
Bayesian non-linear inverse problems: Statistical and computational guarantees
Richard Nickl (University of Cambridge)
-
15:00 - 16:00 • Zoom-Vortrag
Stein variational gradient descent: gradient flows, large deviations and optimal transport
Nikolas Nüsken (King's College London)
Sampling or approximating high-dimensional probability distributions is a key challenge in computational statistics and machine learning. This talk will present connections to gradient flow PDEs, optimal transport and interacting particle systems, focusing on the recently introduced Stein variational gradient descent methodology and some variations. The construction induces a novel geometrical structure on the set of probability distributions related to a positive definite kernel function. We discuss the corresponding geodesic equations, infinitesimal optimal transport maps, as well as large deviation functionals. This is joint work with A. Duncan (Imperial College London), L. Szpruch (University of Edinburgh) and M. Renger (Weierstrass Institute Berlin).
Kaffee/Teepause
-
16:30 - 17:30 • Zoom-Vortrag
Forecast science, learn hidden networks and settle economics conjectures with combinatorics, geometry and statistics
Ngoc Mai Tran (UT Austin)
In many problems, one observes noisy data coming from a hidden or complex combinatorial structure. My research aims to understand and exploit such structures to arrive at an efficient and optimal solution. I will showcase a few successes, achieved with different tools, from different different fields: networks forecasting, hydrology, and auction theory. Then I will outline some open questions in each field.
-
17:30
Ende