Tai−Danae Bradley Biblio
arXiv:2501.06662 [pdf, ps, other]
The Magnitude of Categories of Texts Enriched by Language Models
Abstract: The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a
-enrichment of a category of texts in natural language, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over texts. Secondly, we compute the Möbius function and the magnitude of an associated generalized metric space




of texts using a combinatorial version of these quantities recently introduced by Vigneaux. The magnitude function
of



is a sum over texts
(prompts) of the Tsallis
-entropies of the next-token probability distributions
plus the cardinality of the model's possible outputs. The derivative of





at
recovers a sum of Shannon entropies, which justifies seeing magnitude as a partition function. Following Leinster and Schulman, we also express the magnitude function of


as an Euler characteristic of magnitude homology and provide an explicit description of the zeroeth and first magnitude homology groups. △ Less
Submitted 11 January, 2025; originally announced January 2025.
MSC Class: 18D20; 68T50 ACM Class: I.2.7; G.3
Towards structure-preserving quantum encodings
Abstract: Harnessing the potential computational advantage of quantum computers for machine learning tasks relies on the uploading of classical data onto quantum computers through what are commonly referred to as quantum encodings. The choice of such encodings may vary substantially from one task to another, and there exist only a few cases where structure has provided insight into their design and implementation, such as symmetry in geometric quantum learning. Here, we propose the perspective that category theory offers a natural mathematical framework for analyzing encodings that respect structure inherent in datasets and learning tasks. We illustrate this with pedagogical examples, which include geometric quantum machine learning, quantum metric learning, topological data analysis, and more. Moreover, our perspective provides a language in which to ask meaningful and mathematically precise questions for the design of quantum encodings and circuits for quantum machine learning tasks. △ Less
Submitted 23 December, 2024; originally announced December 2024.
Comments: 17 pages body, 10 pages back matter; Comments welcome!
arXiv:2107.09581 [pdf, ps, other]
Entropy as a Topological Operad Derivation
Abstract: We share a small connection between information theory, algebra, and topology - namely, a correspondence between Shannon entropy and derivations of the operad of topological simplices. We begin with a brief review of operads and their representations with topological simplices and the real line as the main example. We then give a general definition for a derivation of an operad in any category with values in an abelian bimodule over the operad. The main result is that Shannon entropy defines a derivation of the operad of topological simplices, and that for every derivation of this operad there exists a point at which it is given by a constant multiple of Shannon entropy. We show this is compatible with, and relies heavily on, a well-known characterization of entropy given by Faddeev in 1956 and a recent variation given by Leinster. △ Less
Submitted 9 September, 2021; v1 submitted 20 July, 2021; originally announced July 2021.
Comments: 13 pages; v2. version appearing in Entropy (minor changes, typos fixed)
Journal ref: Entropy 2021, 23(9), 1195
Probabilistic Graphical Models and Tensor Networks: A Hybrid Framework
Abstract: We investigate a correspondence between two formalisms for discrete probabilistic modeling: probabilistic graphical models (PGMs) and tensor networks (TNs), a powerful modeling framework for simulating complex quantum systems. The graphical calculus of PGMs and TNs exhibits many similarities, with discrete undirected graphical models (UGMs) being a special case of TNs. However, more general probabilistic TN models such as Born machines (BMs) employ complex-valued hidden states to produce novel forms of correlation among the probabilities. While representing a new modeling resource for capturing structure in discrete probability distributions, this behavior also renders the direct application of standard PGM tools impossible. We aim to bridge this gap by introducing a hybrid PGM-TN formalism that integrates quantum-like correlations into PGM models in a principled manner, using the physically-motivated concept of decoherence. We first prove that applying decoherence to the entirety of a BM model converts it into a discrete UGM, and conversely, that any subgraph of a discrete UGM can be represented as a decohered BM. This method allows a broad family of probabilistic TN models to be encoded as partially decohered BMs, a fact we leverage to combine the representational strengths of both model families. We experimentally verify the performance of such hybrid models in a sequential modeling task, and identify promising uses of our method within the context of existing applications of graphical models. △ Less
Submitted 29 June, 2021; originally announced June 2021.
Comments: 18 pages, 11 figures
arXiv:2106.07890 [pdf, ps, other]
An enriched category theory of language: from syntax to semantics
Abstract: State of the art language models return a natural language text continuation from any piece of input text. This ability to generate coherent text extensions implies significant sophistication, including a knowledge of grammar and semantics. In this paper, we propose a mathematical framework for passing from probability distributions on extensions of given texts, such as the ones learned by today's large language models, to an enriched category containing semantic information. Roughly speaking, we model probability distributions on texts as a category enriched over the unit interval. Objects of this category are expressions in language, and hom objects are conditional probabilities that one expression is an extension of another. This category is syntactical -- it describes what goes with what. Then, via the Yoneda embedding, we pass to the enriched category of unit interval-valued copresheaves on this syntactical category. This category of enriched copresheaves is semantic -- it is where we find meaning, logical operations such as entailment, and the building blocks for more elaborate semantic concepts. △ Less
Submitted 17 November, 2021; v1 submitted 15 June, 2021; originally announced June 2021.
Comments: 29 pages; v2 major revision with new proofs and computations
Language Modeling with Reduced Densities
Abstract: This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also - and quite crucially - because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment. △ Less
Submitted 27 November, 2021; v1 submitted 7 July, 2020; originally announced July 2020.
Comments: 21 pages; v2: added reference; v3: revised abstract and introduction for clarity; v4: Compositionality version
Journal ref: Compositionality, Volume 3 (2021) (November 30, 2021) compositionality:13514
At the Interface of Algebra and Statistics
Abstract: This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals recover classical marginal probabilities. In general, these reduced densities will have rank higher than one, and their eigenvalues and eigenvectors will contain extra information that encodes subsystem interactions governed by statistics. We decode this information, and show it is akin to conditional probability, and then investigate the extent to which the eigenvectors capture "concepts" inherent in the original joint distribution. The theory is then illustrated with an experiment that exploits these ideas. Turning to a more theoretical application, we also discuss a preliminary framework for modeling entailment and concept hierarchy in natural language, namely, by representing expressions in the language as densities. Finally, initial inspiration for this thesis comes from formal concept analysis, which finds many striking parallels with the linear algebra. The parallels are not coincidental, and a common blueprint is found in category theory. We close with an exposition on free (co)completions and how the free-forgetful adjunctions in which they arise strongly suggest that in certain categorical contexts, the "fixed points" of a morphism with its adjoint encode interesting information. △ Less
Submitted 12 April, 2020; originally announced April 2020.
Comments: 135 pages, PhD thesis
Modeling Sequences with Quantum States: A Look Under the Hood
Abstract: Classical probability distributions on sets of sequences can be modeled using quantum states. Here, we do so with a quantum state that is pure and entangled. Because it is entangled, the reduced densities that describe subsystems also carry information about the complementary subsystem. This is in contrast to the classical marginal distributions on a subsystem in which information about the complementary system has been integrated out and lost. A training algorithm based on the density matrix renormalization group (DMRG) procedure uses the extra information contained in the reduced densities and organizes it into a tensor network model. An understanding of the extra information contained in the reduced densities allow us to examine the mechanics of this DMRG algorithm and study the generalization error of the resulting model. As an illustration, we work with the even-parity dataset and produce an estimate for the generalization error as a function of the fraction of the dataset used in training. △ Less
Submitted 16 October, 2019; originally announced October 2019.
Comments: 27 pages
Journal ref: 2020 Mach. Learn.: Sci. Technol. 1 035008
arXiv:1811.11041 [pdf, ps, other]
Translating and Evolving: Towards a Model of Language Change in DisCoCat
Abstract: The categorical compositional distributional (DisCoCat) model of meaning developed by Coecke et al. (2010) has been successful in modeling various aspects of meaning. However, it fails to model the fact that language can change. We give an approach to DisCoCat that allows us to represent language models and translations between them, enabling us to describe translations from one language to another, or changes within the same language. We unify the product space representation given in (Coecke et al., 2010) and the functorial description in (Kartsaklis et al., 2013), in a way that allows us to view a language as a catalogue of meanings. We formalize the notion of a lexicon in DisCoCat, and define a dictionary of meanings between two lexicons. All this is done within the framework of monoidal categories. We give examples of how to apply our methods, and give a concrete suggestion for compositional translation in corpora. △ Less
Submitted 8 November, 2018; originally announced November 2018.
Comments: In Proceedings CAPNS 2018, arXiv:1811.02701
Journal ref: EPTCS 283, 2018, pp. 50-61
What is Applied Category Theory?
Abstract: This is a collection of introductory, expository notes on applied category theory, inspired by the 2018 Applied Category Theory Workshop, and in these notes we take a leisurely stroll through two themes (functorial semantics and compositionality), two constructions (monoidal categories and decorated cospans) and two examples (chemical reaction networks and natural language processing) within the field. △ Less
Submitted 3 October, 2018; v1 submitted 16 September, 2018; originally announced September 2018.
Comments: 50 pages, 49 figures; in v2: corrected typos & figure p. 38
On the Distribution of the Greatest Common Divisor of Gaussian Integers
Abstract: For a pair of random Gaussian integers chosen uniformly and independently from the set of Gaussian integers of norm
or less as
goes to infinity, we find asymptotics for the average norm of their greatest common divisor, with explicit error terms. We also present results for higher moments along with computational data which support the results for the second, third, fourth, and fifth moments. The analogous question for integers is studied by Diaconis and Erdös. △ Less
Submitted 1 March, 2015; v1 submitted 7 February, 2015; originally announced February 2015.
Comments: 13 pages, 4 figures
MSC Class: 11N37; 11A05; 11K65; 60E05
Journal ref: Involve. vol. 9,1 (2016): 27-40
コメント
コメントを投稿