Conditional Noise-Contrastive Estimation of Unnormalised Models

Authors: Ciwan Ceylan, Michael U. Gutmann

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3, we validate the theory on synthetic data and compare the estimation performance of CNCE with NCE. In Section 4, we apply CNCE to real data and show that it can handle complex models by estimating a four-layer neural network model of natural images
Researcher Affiliation Academia 1UMIC, RWTH Aachen University, Aachen, Germany (affiliated with KTH Royal Institute of Technology and University of Edinburgh during project timespan) 2School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement about releasing its own source code or a link to a code repository for the described methodology.
Open Datasets Yes The data X are image patches of size 32 32 px, sampled from 11 different monochrome images depicting wild life scenes (van Hateren & van der Schaaf, 1998)
Dataset Splits No The paper does not explicitly provide specific dataset split information (e.g., exact percentages, sample counts, or details on cross-validation) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies or library versions (e.g., Python 3.x, PyTorch 1.x) used to replicate the experiment.
Experiment Setup Yes In our simulations, we adjust ε using simple heuristics so that the gradients of the loss function are not too small. This typically occurs when ε is too large so that the noise and data are easily distinguishable, but also when ε is too small. It can be verified that the loss function attains the value 2 log(2) for ε = 0 independent of the model and θ. In brief, the heuristic algorithm starts with a small ε that is incremented until the value of the loss function is sufficiently far away from 2 log(2). We learned the weights hierarchically one layer at a time, e.g. after learning of the 1st layer weights, we kept them fixed and learned the second layer weight vector w(2) j etc.