reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast training and sampling of Restricted Boltzmann Machines

Authors: Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner, Lorenzo Rosset, Beatriz Seoane

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results demonstrate that this pre-training strategy allows RBMs to efficiently handle highly structured datasets where conventional methods fail. Additionally, our log-likelihood estimation outperforms computationally intensive approaches in controlled scenarios, while the PTT algorithm significantly accelerates MCMC processes compared to conventional methods.
Researcher Affiliation	Academia	Nicolas B ereux INRIA-Saclay, LISN Paris-Saclay University; Aur elien Decelle Escuela T ecnica Superior de Ingenieros Industriales Universidad Polit ecnica de Madrid Departamento de F ısica Te orica Universidad Complutense de Madrid; Cyril Furtlehner INRIA-Saclay, LISN Paris-Saclay University; Lorenzo Rosset LCQB, Sorbonne Universit e, Laboratoire de Physique Th eorique Ecole Normale Sup erieure, Paris; Beatriz Seoane Departamento de F ısica Te orica Universidad Complutense de Madrid
Pseudocode	Yes	The pseudocodes for the standard PT and our new PTT algorithms are provided in SI B.1. ... Algorithm 1 Parallel Tempering ... Algorithm 2 Parallel Trajectory Tempering
Open Source Code	Yes	CODE AVAILABILITY The code and datasets are available at https://github.com/Dsys DML/fastrbm
Open Datasets	Yes	Figure 1: Datasets. Panels A-E display 5 distinct datasets projected onto their first two PCA components. In A, the MNIST 01 dataset... In B, the Mickey dataset... In C, the Human Genome Dataset (HGD)... In D, the Ising dataset... In E, the Celeb A dataset... For more details on these datasets, please refer to the SI. ... The code and datasets are available at https://github.com/Dsys DML/fastrbm
Dataset Splits	Yes	Table 2: Details of the datasets used during training. Name #Samples #Dimensions Train size Test size ... Celeb A 30 000 1024 60% 40% Human Genome Dataset (HGD) 4500 805 60% 40% Ising 20 000 64 60% 40% Mickey 16 000 1000 60% 40% MNIST-01 10 610 784 60% 40% MNIST 50 000 784 60% 40%
Hardware Specification	Yes	All experiments were run on a RTX 4090 with an AMD Ryzen 9 5950X.
Software Dependencies	No	The paper discusses various algorithms and methods (e.g., MCMC, PCD, AIS, PTT, RBMs) and references several theoretical frameworks and previous works. However, it does not explicitly list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) used for implementing the described methodology. While the code is available, the paper itself does not detail these versions.
Experiment Setup	Yes	Table 3: Hyperparameters used for the training of RBMs. Name Batch size #Chains #Epochs Learning rate #MCMC steps #Hidden nodes ... PCD 2000 2000 10 000 0.01 100 500 ... Pre-train+PCD 2000 2000 10 000 0.01 100 500