Provable benefits of score matching

Authors: Chirag Pabbaraju, Dhruv Rohatgi, Anish Prasad Sevekari, Holden Lee, Ankur Moitra, Andrej Risteski

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood both computational and statistical are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension.
Researcher Affiliation Academia Chirag Pabbaraju Stanford University cpabbara@cs.stanford.edu Dhruv Rohatgi MIT drohatgi@mit.edu Anish Sevekari Carnegie Mellon University asevekar@andrew.cmu.edu Holden Lee Johns Hopkins University hlee283@jhu.edu Ankur Moitra MIT moitra@mit.edu Andrej Risteski Carnegie Mellon University aristesk@andrew.cmu.edu
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper is theoretical and does not mention any accompanying open-source code for the described methodology.
Open Datasets No The paper is theoretical and discusses 'samples drawn from pθ' in a statistical sense for theoretical analysis, but does not use or describe a publicly available dataset for training.
Dataset Splits No The paper is theoretical and does not perform experiments with data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any computational experiments that would require specific hardware.
Software Dependencies No The paper is theoretical and does not describe any software implementations or dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any experimental setup, hyperparameters, or training settings.