Conditional Noise-Contrastive Estimation of Unnormalised Models
Authors: Ciwan Ceylan, Michael U. Gutmann
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 3, we validate the theory on synthetic data and compare the estimation performance of CNCE with NCE. In Section 4, we apply CNCE to real data and show that it can handle complex models by estimating a four-layer neural network model of natural images |
| Researcher Affiliation | Academia | 1UMIC, RWTH Aachen University, Aachen, Germany (affiliated with KTH Royal Institute of Technology and University of Edinburgh during project timespan) 2School of Informatics, University of Edinburgh, Edinburgh, United Kingdom. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | The data X are image patches of size 32 32 px, sampled from 11 different monochrome images depicting wild life scenes (van Hateren & van der Schaaf, 1998) |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (e.g., exact percentages, sample counts, or details on cross-validation) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or library versions (e.g., Python 3.x, PyTorch 1.x) used to replicate the experiment. |
| Experiment Setup | Yes | In our simulations, we adjust ε using simple heuristics so that the gradients of the loss function are not too small. This typically occurs when ε is too large so that the noise and data are easily distinguishable, but also when ε is too small. It can be verified that the loss function attains the value 2 log(2) for ε = 0 independent of the model and θ. In brief, the heuristic algorithm starts with a small ε that is incremented until the value of the loss function is sufficiently far away from 2 log(2). We learned the weights hierarchically one layer at a time, e.g. after learning of the 1st layer weights, we kept them fixed and learned the second layer weight vector w(2) j etc. |