Information, Logic, and Inference in the Analysis of Complex Networks
Dissertation
Datum der mündl. Prüfung:2023-12-04
Erschienen:2023-12-22
Betreuer:Prof. Dr. Michael Wibral
Gutachter:Prof. Dr. Michael Wibral
Gutachter:Prof. Dr. Fred Wolf
Dateien
Name:PhD_Thesis___Information__Logic__and_Inferen...pdf
Size:4.88Mb
Format:PDF
Description:dissertation
Zusammenfassung
Englisch
This thesis deals with a range of current topics in information theory and statistics. It consists of five distinct contributions: Chapter 2 focuses on the statistics of single-regression Granger causality estimators. Chapters 3-5 address the theory of Partial Information Decomposition (PID), an extension of classical Shannon Information Theory. Chapter 6 is about Significant Subgraph Mining, a statistical method for finding differences between graph-generating processes with multiple comparisons correction. In the following, a brief summary of each contribution is provided: Chapter 2, "Sampling Distribution of Single Regression Granger Causality estimators", deals with the statistics of single regression Granger causality estimators for which only the full auto-regressive model has to be estimated while the parameters of the reduced model (regressing the target process only on its own past) are analytically or numerically derived from the full model parameters. This is in contrast to standard dual regression estimators for which both the full and the reduced model have to be estimated. The paper shows that the asymptotic distribution of single regression Granger causality estimators under the hypothesis of vanishing Granger causality is a generalized χ2 -distribution which is in many cases well approximated by a Γ-distribution. This is true for time-domain Granger causality as well as band-limited Granger causality which is particularly useful for neuroscientific applications in which a particular frequency-band may be of interest. The paper also derives asymptotically valid significance tests based on the derived sampling distributions. Chapter 3, "Introducing a differentiable measure of pointwise shared information", proposes a measure of the information shared between particular realizations of a set of source variables about a particular realization of a target variable. In this sense it is a pointwise measure. It is constructed in close analogy to classical pointwise mutual information. This can be achieved in two ways: First, based on the insight that pointwise mutual information can be defined in terms of probability mass exclusions. Analogously, pointwise shared information may be introduced in terms of shared probability mass exclusions. Second, pointwise mutual information can be seen as the information about the value of a target variable provided by the truth of a certain logical statement about the source variables. Similarly, there is a logical statement about the source realizations that reasonably carries their shared information about the target realization. The resulting measure of pointwise shared information isx exhibits desirable properties for applications, in particular its differentiability with respect to the underlying probability distribution. Further, any general measure of shared information implies an entire Partial Information Decomposition, which in the case of isx will also be differentiable. This makes it possible to define goal functions in terms of PID quantities (e.g. "maximize redundancy") with which neural networks can be trained. Chapter 4, "Bits and Pieces: understanding information decomposition from part- whole relations and formal logic", shows that the entire theory of PID can be derived, firstly, from considerations of part-whole relationships between information atoms and mutual information terms, and secondly, based on a hierarchy of logical constraints describing how a given information atom can be accessed. In this way, the idea of a PID is developed on the basis of two of the most elementary relationships in nature: the part-whole relationship and the relation of logical implication. This unifying perspective provides insights into pressing questions in the field such as the possibility of constructing a PID based on concepts other than redundant information in the general n-sources case. The paper also presents a re-derivation of the shared exclusions measure of redundant information introduced in Chapter 3 based on principles of logic and mereology (the study of part-whole relationships). Chapter 5, "From Babel to Boole: The Logical Structure of Information Decompositions", expands upon the ideas presented in "Bits and Pieces". The central theme of this chapter revolves around PID "base-concepts". These are information functionals which, when defined, induce a complete PID. Within the parthood approach, these base-concepts are expressed in terms of conditions phrased in formal logic on the specific parthood relations between the PID components and the different mutual information terms. The work identifies a general pattern for these logical conditions. Every PID base-concept in the existing literature fits within this pattern as special cases. Moreover, it leads to a novel base-concepts called "vulnerable information" which quantifies information that may be lost if one loses access to one of the sources. Furthermore, all PID base-concepts are shown to fall into equivalence classes of measures that describe the same information components but viewed from the perspective of different source collections. Chapter 6, "Significant Subgraph Mining for Neural Network Analysis with Multiple Comparison Correction", addresses a problem of graph statistics which often comes up in the next step after an information theoretic analysis. Suppose for instance that we have performed a pairwise Granger causality analysis of MEG data in two experimental groups. For each group we obtain a set of graphs (one for each subject) and we would like to know if there are any differences between the groups. Maybe a particular connection is more likely to occur in one group rather than the other. And even if there are no such differences on a per-link basis, there may be differences in the dependencies between links. For instance, while two connections may always appear together in one group they may occur completely independently in the other. In principle, any possible stochastic difference between the two graph graph-generating processes can be expressed in terms of the probabilities of occurrence of specific subgraphs. Significant Subgraph Mining systematically tests all such differences while correcting for the formidable multiple comparisons problem arising because the total number of possible subgraphs scales super-exponentially in the number of graph nodes. The paper extends the method to within-subject experimental designs that allows for dependencies between the graph-generating processes. It also provides a systematic analysis of its error-statistical properties in simulation and in empirical data in order to derive practical recommendations for the application of subgraph mining in neuroscience. In particular, it presents an empirical power analysis for Transfer Entropy networks inferred from resting state MEG data comparing autism spectrum patients with neurotypical controls. Finally, a python implementation as part of the openly available IDTxl toolbox is provided.
Keywords: Information Theory; Stochastic Processes; Granger Causality; Partial Information Decomposition; Synergy; Redundancy; Emergence; Graph Statistics; Complex Systems; Network Analysis; Theoretical Neuroscience; Multiple Comparisons; Mereology; Formal Logic