reproducibilityindex.ai

Pitfalls of Gaussians as a noise distribution in NCE

Authors: Holden Lee, Chirag Pabbaraju, Anish Prasad Sevekari, Andrej Risteski

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also verify our results with simulations. Precisely, we study the MSE for the empirical NCE loss as a function of the ambient dimension, and recover the dependence from Theorem 4. For dimension d {70, 72, . . . , 120}, we generate n = 500 samples from the distribution P we construct in the theorem. We generate an equal number of samples from the noise distribution Q = N(0, Id), and run gradient descent to minimize the empirical NCE loss to obtain ˆθn. Since we explicitly know what θ is, we can compute the squared error ˆθn θ 2. We run 100 trials of this, where we obtain fresh samples each time from P and Q, and average the squared errors over the trials to obtain an estimate of the MSE. Figure 1 shows the plot of log MSE versus dimension we can see that the graph is nearly linear. This corroborates the bound in Theorem 4, which tells us that as n , the MSE scales exponentially with d.
Researcher Affiliation	Academia	Holden Lee Johns Hopkins University hlee283@jhu.edu Chirag Pabbaraju Stanford University cpabbara@cs.stanford.edu Anish Sevekari Carnegie Mellon University asevekar@andrew.cmu.edu Andrej Risteski Carnegie Mellon University aristesk@andrew.cmu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a statement or link indicating that source code for the methodology is openly available.
Open Datasets	No	The paper states: "For dimension d {70, 72, . . . , 120}, we generate n = 500 samples from the distribution P we construct in the theorem. We generate an equal number of samples from the noise distribution Q = N(0, Id)..." This indicates the data was generated internally based on a theoretical distribution, not a publicly accessible dataset with a link or citation.
Dataset Splits	No	The paper mentions generating samples for simulations but does not specify any training/validation/test splits, nor does it refer to predefined splits with citations.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, memory, or specific computing environments used for the simulations.
Software Dependencies	No	The paper mentions running 'gradient descent' but does not specify any software libraries or their version numbers used for implementation (e.g., PyTorch, TensorFlow, scikit-learn, etc.).
Experiment Setup	No	The paper states: "For dimension d {70, 72, . . . , 120}, we generate n = 500 samples from the distribution P we construct in the theorem. We generate an equal number of samples from the noise distribution Q = N(0, Id), and run gradient descent to minimize the empirical NCE loss to obtain ˆθn." While it specifies the number of samples and dimension range, it lacks specific hyperparameters for gradient descent (e.g., learning rate, batch size, epochs) or other detailed training configurations necessary for full reproducibility.