Parallel algorithms for tensor arithmetic in low-rank formats and applications
- Parallele Algorithmen für Tensorarithmetik in Niedrigrangformaten und Anwendungen
Löbbert, Christian; Grasedyck, Lars (Thesis advisor); Börm, Steffen (Thesis advisor)
Dissertation / PhD Thesis
Dissertation, RWTH Aachen University, 2019
The aim of this thesis is the development of parallel algorithms for tensor arithmetic (as, e.g., dot products, tensor truncations or Hadamard products) in the Hierarchical Tucker format (HT-format), which is a low-rank format for tensors based on a hierarchy of the tensor directions according to a binary tree. We start with an introduction to tensors and different tensor formats and demonstrate why the HT-format is particularly suitable for our purpose. In sequential implementation the tensor arithmetic of this work has a complexity which grows linearly with the tensor order d. Choosing the underlying binary tree in the right way, and assuming that enough parallel processes are available, this complexity can be reduced to O(log(d)) by parallelization, which is verified by runtime tests performed for the implemented algorithms.The algorithms have been developed in C++ using the MPI standard for parallelization and the blaze-lib for the involved matrix arithmetic. In particular, the algorithms developed within the framework of this work support the distributed storage of a tensor in HT-format on several compute nodes. The publication of the algorithms is planned.At the end of this work an application of parallel tensor arithmetic is presented: together with Lukas Juschka, algorithms have been developed, which approximate the maximum norm of a low-rank tensor and are able to find a corresponding tensor index, which maximizes the absolute value. The approximation of the maximum norm is based on a power iteration, which requires the computation of dot products and Hadamard products, and thus inherits the O(d)-complexity of the tensor arithmetic in the HT-format. The parallel implementation developed in this work leads to a reduction of the complexity to O(log(d)). Finding a corresponding maximizing tensor index is carriedout via a binary search, which itself needs O(d) applications of the maximum norm approximation. In sequential implementation this leads to an O(d²)-complexity, which can be reduced to O(d log(d)), i.e. quasi-linear complexity, by using parallel tensor arithmetic.