Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fast training and sampling of Restricted Boltzmann Machines
Authors: Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner, Lorenzo Rosset, Beatriz Seoane
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that this pre-training strategy allows RBMs to efficiently handle highly structured datasets where conventional methods fail. Additionally, our log-likelihood estimation outperforms computationally intensive approaches in controlled scenarios, while the PTT algorithm significantly accelerates MCMC processes compared to conventional methods. |
| Researcher Affiliation | Academia | Nicolas B ereux INRIA-Saclay, LISN Paris-Saclay University; Aur elien Decelle Escuela T ecnica Superior de Ingenieros Industriales Universidad Polit ecnica de Madrid Departamento de F ısica Te orica Universidad Complutense de Madrid; Cyril Furtlehner INRIA-Saclay, LISN Paris-Saclay University; Lorenzo Rosset LCQB, Sorbonne Universit e, Laboratoire de Physique Th eorique Ecole Normale Sup erieure, Paris; Beatriz Seoane Departamento de F ısica Te orica Universidad Complutense de Madrid |
| Pseudocode | Yes | The pseudocodes for the standard PT and our new PTT algorithms are provided in SI B.1. ... Algorithm 1 Parallel Tempering ... Algorithm 2 Parallel Trajectory Tempering |
| Open Source Code | Yes | CODE AVAILABILITY The code and datasets are available at https://github.com/Dsys DML/fastrbm |
| Open Datasets | Yes | Figure 1: Datasets. Panels A-E display 5 distinct datasets projected onto their first two PCA components. In A, the MNIST 01 dataset... In B, the Mickey dataset... In C, the Human Genome Dataset (HGD)... In D, the Ising dataset... In E, the Celeb A dataset... For more details on these datasets, please refer to the SI. ... The code and datasets are available at https://github.com/Dsys DML/fastrbm |
| Dataset Splits | Yes | Table 2: Details of the datasets used during training. Name #Samples #Dimensions Train size Test size ... Celeb A 30 000 1024 60% 40% Human Genome Dataset (HGD) 4500 805 60% 40% Ising 20 000 64 60% 40% Mickey 16 000 1000 60% 40% MNIST-01 10 610 784 60% 40% MNIST 50 000 784 60% 40% |
| Hardware Specification | Yes | All experiments were run on a RTX 4090 with an AMD Ryzen 9 5950X. |
| Software Dependencies | No | The paper discusses various algorithms and methods (e.g., MCMC, PCD, AIS, PTT, RBMs) and references several theoretical frameworks and previous works. However, it does not explicitly list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) used for implementing the described methodology. While the code is available, the paper itself does not detail these versions. |
| Experiment Setup | Yes | Table 3: Hyperparameters used for the training of RBMs. Name Batch size #Chains #Epochs Learning rate #MCMC steps #Hidden nodes ... PCD 2000 2000 10 000 0.01 100 500 ... Pre-train+PCD 2000 2000 10 000 0.01 100 500 |