Wasserstein Training of Restricted Boltzmann Machines
Authors: Grégoire Montavon, Klaus-Robert Müller, Marco Cuturi
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role. |
| Researcher Affiliation | Academia | Grégoire Montavon Technische Universität Berlin gregoire.montavon@tu-berlin.de Klaus-Robert Müller Technische Universität Berlin klaus-robert.muller@tu-berlin.de Marco Cuturi CREST, ENSAE, Université Paris-Saclay marco.cuturi@ensae.fr Also with the Department of Brain and Cognitive Engineering, Korea University. |
| Pseudocode | No | The paper describes the steps of the Sinkhorn algorithm in paragraph form (e.g., 'repeat until u, v converge...'), but it does not present a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | We consider three datasets: MNIST-small, a subsampled version of the original MNIST dataset [11] with only the digits 0 retained, a subset of the UCI PLANTS dataset [19] containing the geographical spread of plants species, and MNIST-code, 128-dimensional code vectors associated to each MNIST digit (additional details in the supplement). |
| Dataset Splits | Yes | We perform holdout validation on the quadratic containment coefficient η {10 4, 10 3, 10 2}, and on the KL weighting coefficient λ {0, 10 1, 100, 101, }. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | The number of hidden units of the RBM is set heuristically to 400 for all datasets. The learning rate is set heuristically to 0.01(λ 1) during the pretraining phase and modified to 0.01 min(1, λ 1) when training on the final objective. with smoothing parameter γ = 0.1 and distance D(x, x ) = H(x, x )/ H(x, x ) ˆp. |