Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation
Authors: Bingbin Liu, Elan Rosenfeld, Pradeep Kumar Ravikumar, Andrej Risteski
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then provide empirical evidence on synthetic and MNIST dataset that e NCE with NGD performs comparatively with NGD on the original NCE loss, and both outperform gradient descent. |
| Researcher Affiliation | Academia | Machine Learning Department Carnegie Mellon University |
| Pseudocode | No | The paper describes algorithms and methods but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for their methodology. |
| Open Datasets | Yes | To corroborate our theory, we verify the effectiveness of NGD and e NCE on Gaussian mean estimation and the MNIST dataset. |
| Dataset Splits | No | The paper mentions using the MNIST dataset but does not specify the training, validation, or test splits. It refers to adopting the setup from TRE (Rhodes et al., 2020) but doesn't detail the splits within this paper. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running the experiments. It mentions using Res Net-18, which implies GPU usage, but no specific models or configurations are provided. |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | For Gaussian data, we run gradient descent (GD) and normalized gradient descent (NGD) on the NCE loss and e NCE loss. The plots show the minimum parameter distance mint [T ] τ τt 2 for each step T. We include implementation details in Appendix E.1 and additional results in Appendix E.2. We ensure this by limiting the norm of the gradient, that is, the gradient from a sample x is now min{1, K ℓ(x) } ℓ(x) for some prespecified constant K (Tsai et al., 2021). Per-sample log ratio clipping: an alternative to per-sample gradient clipping is to upper threshold the absolute value of the log density ratio on each sample, before passing it to the loss function. |