Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The quest for the GRAph Level autoEncoder (GRALE)

Authors: Paul Krzakala, Gabriel Melo, Charlotte Laclau, Florence d'Alché-Buc, Rémi Flamary

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show, in numerical experiments on simulated and molecular data, that GRALE enables a highly general form of pre-training, applicable to a wide range of downstream tasks, from classification and regression to more complex tasks such as graph interpolation, editing, matching, and prediction.1
Researcher Affiliation	Academia	Paul Krzakala LTCI & CMAP , Télécom Paris, IP Paris Gabriel Melo LTCI, Télécom Paris, IP Paris Charlotte Laclau LTCI, Télécom Paris, IP Paris Florence d Alché-Buc LTCI, Télécom Paris, IP Paris Rémi Flamary CMAP, Ecole Polytechnique, IP Paris
Pseudocode	No	The paper describes the architecture using figures and text (e.g., Figure 14: Architecture of the Evoformer Encoder layer, Figure 15: Architecture of the Transformer Decoder, Appendix D: Definitions of attention based models inner blocks), but does not present any explicitly labeled pseudocode or algorithm blocks for the overall GRALE model or its training procedure.
Open Source Code	Yes	1Code available at https://github.com/Krzakala Paul/GRALE
Open Datasets	Yes	Training datasets. We train GRALE on three datasets. First, COLORING is a synthetic graph dataset introduced in [37] where each instance is a connected graph whose node labels are colors that satisfy the four color theorem. ... Then, for molecular representation learning, we download and preprocess molecules from the PUBCHEM database [33]. ... We use the graph representations obtained by GRALE as input for classification and regression tasks in the Molecule Net benchmark [70].
Dataset Splits	Yes	All models are trained on PUBCHEM 16, with a holdout set of 10,000 graphs for evaluation. For classification and regression downstream tasks... we sample new random train/test splits (90%/10%). Table 3: Downstream tasks performance of different graph representation learning methods pretrained on PUBCHEM 32. We report the mean std over 5 train/test splits.
Hardware Specification	Yes	This project was provided with computing AI and storage resources by GENCI at IDRIS thanks to the grant 2025-AD011016098 on the supercomputer Jean Zay s H100 partition. Table 9: NUMBER OF GPUS (L40S) 1 1 2
Software Dependencies	No	The paper mentions using 'RDKit [39]' for converting SMILES strings to graphs and 'ADAM [35]' as an optimizer, but does not provide specific version numbers for these or other software libraries (e.g., Python, PyTorch/TensorFlow, CUDA versions).
Experiment Setup	Yes	Table 9: For every dataset used to train GRALE, we report: 1) The architecture parameters, 2) The training Parameters, 3) The computational resources required. ... MAXIMUM OUTPUT SIZE N, NUMBER OF TOKENS K, DIMENSION OF TOKENS D, TOTAL EMBEDDING DIM d = K D, NUMBER OF LAYERS, NUMBER OF ATTENTION HEADS, NODE DIMENSIONS, NODE HIDDEN DIMENSIONS (MLPS), EDGE DIMENSIONS, EDGE HIDDEN DIMENSIONS (MLPS), TOTAL PARAMETER COUNT, NUMBER OF GRADIENT STEPS, BATCH SIZE, EPOCHS, NUMBER OF WARMUP STEPS, BASE LEARNING RATE, GRADIENT NORM CLIPPING. In all cases, we trained with ADAM [35], a warm-up phase, and cosine annealing.