reproducibilityindex.ai

Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation

Authors: Bingbin Liu, Elan Rosenfeld, Pradeep Kumar Ravikumar, Andrej Risteski

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then provide empirical evidence on synthetic and MNIST dataset that e NCE with NGD performs comparatively with NGD on the original NCE loss, and both outperform gradient descent.
Researcher Affiliation	Academia	Machine Learning Department Carnegie Mellon University
Pseudocode	No	The paper describes algorithms and methods but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for their methodology.
Open Datasets	Yes	To corroborate our theory, we verify the effectiveness of NGD and e NCE on Gaussian mean estimation and the MNIST dataset.
Dataset Splits	No	The paper mentions using the MNIST dataset but does not specify the training, validation, or test splits. It refers to adopting the setup from TRE (Rhodes et al., 2020) but doesn't detail the splits within this paper.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running the experiments. It mentions using Res Net-18, which implies GPU usage, but no specific models or configurations are provided.
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	For Gaussian data, we run gradient descent (GD) and normalized gradient descent (NGD) on the NCE loss and e NCE loss. The plots show the minimum parameter distance mint [T ] τ τt 2 for each step T. We include implementation details in Appendix E.1 and additional results in Appendix E.2. We ensure this by limiting the norm of the gradient, that is, the gradient from a sample x is now min{1, K ℓ(x) } ℓ(x) for some prespeciﬁed constant K (Tsai et al., 2021). Per-sample log ratio clipping: an alternative to per-sample gradient clipping is to upper threshold the absolute value of the log density ratio on each sample, before passing it to the loss function.