Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multilevel Generative Samplers for Investigating Critical Phenomena
Authors: Ankur Singha, Elia Cellini, Kim A. Nicoli, Karl Jansen, Stefan KΓΌhn, Shinichi Nakajima
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the effective sample size of Ri GCS is a few orders of magnitude higher than state-of-the-art generative model baselines in sampling configurations for 128 128 two-dimensional Ising systems. |
| Researcher Affiliation | Academia | 1BIFOLD, Germany, 2 Technische Universit at Berlin, Germany 3Universit a degli Studi di Torino, Italy, 4 INFN Torino, Italy, 5University of Bonn, Germany 6Helmholtz Institute for Radiation and Nuclear Physics (HISKP) 7Deutsches Elektronen-Synchrotron (DESY), Germany, 8RIKEN Center for AIP, Japan |
| Pseudocode | Yes | The pseudocodes provided in Algorithm 1 and Algorithm 2 describe the practical steps for training Ri GCS and for sampling from a trained Ri GCS, respectively. |
| Open Source Code | Yes | The code is available at https://github.com/mlneuralsampler/multilevel. |
| Open Datasets | No | The paper uses the two-dimensional Ising model, which is a theoretical model and not a publicly available dataset in the conventional sense. Configurations for this model are generated through simulation, rather than being loaded from a pre-existing data source. |
| Dataset Splits | No | The paper evaluates a simulated physical system (the Ising model) and does not use pre-split datasets for training, validation, or testing in the traditional machine learning context. |
| Hardware Specification | Yes | For all models (Ri GCS and the baselines), we used a single NVIDIA A100 GPU with 80 GB of memory. |
| Software Dependencies | No | The paper mentions using the ADAM optimizer and Pixel CNN architecture, but does not provide specific version numbers for software libraries or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We trained VANs for 50000 gradient updates (steps) with batch size 100, and HANs for 100000 gradient updates with batch size 1000. For Ri GCS, training is performed for a total of 3000 steps for each sequential (upscaled) target lattice. When training on a target lattice NL = N, the pretraining phase involves training at coarser levels as follows: 2000 steps for level L 2, 1500 steps for level L 4, and 1000 steps for all previous levels, except for the coarsest one which is always trained for 500 steps. |