Wasserstein Training of Restricted Boltzmann Machines

Authors: Grégoire Montavon, Klaus-Robert Müller, Marco Cuturi

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.
Researcher Affiliation Academia Grégoire Montavon Technische Universität Berlin gregoire.montavon@tu-berlin.de Klaus-Robert Müller Technische Universität Berlin klaus-robert.muller@tu-berlin.de Marco Cuturi CREST, ENSAE, Université Paris-Saclay marco.cuturi@ensae.fr Also with the Department of Brain and Cognitive Engineering, Korea University.
Pseudocode No The paper describes the steps of the Sinkhorn algorithm in paragraph form (e.g., 'repeat until u, v converge...'), but it does not present a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We consider three datasets: MNIST-small, a subsampled version of the original MNIST dataset [11] with only the digits 0 retained, a subset of the UCI PLANTS dataset [19] containing the geographical spread of plants species, and MNIST-code, 128-dimensional code vectors associated to each MNIST digit (additional details in the supplement).
Dataset Splits Yes We perform holdout validation on the quadratic containment coefficient η {10 4, 10 3, 10 2}, and on the KL weighting coefficient λ {0, 10 1, 100, 101, }.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes The number of hidden units of the RBM is set heuristically to 400 for all datasets. The learning rate is set heuristically to 0.01(λ 1) during the pretraining phase and modified to 0.01 min(1, λ 1) when training on the final objective. with smoothing parameter γ = 0.1 and distance D(x, x ) = H(x, x )/ H(x, x ) ˆp.