Provable benefits of score matching
Authors: Chirag Pabbaraju, Dhruv Rohatgi, Anish Prasad Sevekari, Holden Lee, Ankur Moitra, Andrej Risteski
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood both computational and statistical are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension. |
| Researcher Affiliation | Academia | Chirag Pabbaraju Stanford University cpabbara@cs.stanford.edu Dhruv Rohatgi MIT drohatgi@mit.edu Anish Sevekari Carnegie Mellon University asevekar@andrew.cmu.edu Holden Lee Johns Hopkins University hlee283@jhu.edu Ankur Moitra MIT moitra@mit.edu Andrej Risteski Carnegie Mellon University aristesk@andrew.cmu.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper is theoretical and does not mention any accompanying open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and discusses 'samples drawn from pθ' in a statistical sense for theoretical analysis, but does not use or describe a publicly available dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not perform experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require specific hardware. |
| Software Dependencies | No | The paper is theoretical and does not describe any software implementations or dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup, hyperparameters, or training settings. |