Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multilevel Generative Samplers for Investigating Critical Phenomena

Authors: Ankur Singha, Elia Cellini, Kim A. Nicoli, Karl Jansen, Stefan Kühn, Shinichi Nakajima

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the effective sample size of Ri GCS is a few orders of magnitude higher than state-of-the-art generative model baselines in sampling configurations for 128 128 two-dimensional Ising systems.
Researcher Affiliation	Academia	1BIFOLD, Germany, 2 Technische Universit at Berlin, Germany 3Universit a degli Studi di Torino, Italy, 4 INFN Torino, Italy, 5University of Bonn, Germany 6Helmholtz Institute for Radiation and Nuclear Physics (HISKP) 7Deutsches Elektronen-Synchrotron (DESY), Germany, 8RIKEN Center for AIP, Japan
Pseudocode	Yes	The pseudocodes provided in Algorithm 1 and Algorithm 2 describe the practical steps for training Ri GCS and for sampling from a trained Ri GCS, respectively.
Open Source Code	Yes	The code is available at https://github.com/mlneuralsampler/multilevel.
Open Datasets	No	The paper uses the two-dimensional Ising model, which is a theoretical model and not a publicly available dataset in the conventional sense. Configurations for this model are generated through simulation, rather than being loaded from a pre-existing data source.
Dataset Splits	No	The paper evaluates a simulated physical system (the Ising model) and does not use pre-split datasets for training, validation, or testing in the traditional machine learning context.
Hardware Specification	Yes	For all models (Ri GCS and the baselines), we used a single NVIDIA A100 GPU with 80 GB of memory.
Software Dependencies	No	The paper mentions using the ADAM optimizer and Pixel CNN architecture, but does not provide specific version numbers for software libraries or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	We trained VANs for 50000 gradient updates (steps) with batch size 100, and HANs for 100000 gradient updates with batch size 1000. For Ri GCS, training is performed for a total of 3000 steps for each sequential (upscaled) target lattice. When training on a target lattice NL = N, the pretraining phase involves training at coarser levels as follows: 2000 steps for level L 2, 1500 steps for level L 4, and 1000 steps for all previous levels, except for the coarsest one which is always trained for 500 steps.