On the Uncomputability of Partition Functions in Energy-Based Sequence Models
Authors: Chu-Cheng Lin, Arya D. McCarthy
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we argue that energy-based sequence models backed by expressive parametric families can result in uncomputable and inapproximable partition functions. Among other things, this makes model selection and therefore learning model parameters not only difficult, but generally undecidable. The reason is that there are no good deterministic or randomized estimators of partition functions. Specifically, we exhibit a pathological example where under common assumptions, no useful importance sampling estimators of the partition function can guarantee to have variance bounded below a rational number. As alternatives, we consider sequence model families whose partition functions are computable (if they exist), but at the cost of reduced expressiveness. Our theoretical results suggest that statistical procedures with asymptotic guarantees and sheer (but finite) amounts of compute are not the only things that make sequence modeling work; computability concerns must not be neglected as we consider more expressive model parametrizations. |
| Researcher Affiliation | Academia | Chu-Cheng Lin & Arya D. Mc Carthy Center for Language and Speech Processing, Johns Hopkins University |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that open-source code for the described methodology is available. |
| Open Datasets | No | This is a theoretical paper and does not involve training models on datasets for empirical evaluation. |
| Dataset Splits | No | This is a theoretical paper and does not involve dataset splits for validation. |
| Hardware Specification | No | This is a theoretical paper, and therefore, no hardware specifications for running experiments are mentioned. |
| Software Dependencies | No | This is a theoretical paper, and no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | This is a theoretical paper, and no experimental setup details such as hyperparameters or training settings are provided. |