Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Statistical Mechanisms of Distributional Compositional Generalization

Authors: Jingwen Fu, Nanning Zheng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiment... 6.2. Experiemnts on trade-off and non-trade-off improvement... 6.3. Experiments on Generalization Bounds... Table 1. Values of IA,β( T = T, PS) and GACC over 10 instances... Table 2. Performance comparison across rule complexities
Researcher Affiliation	Academia	Jingwen Fu 1 Nanning Zheng 1... 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University. Correspondence to: Nanning Zheng <EMAIL>.
Pseudocode	No	The paper describes methods and analyses using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	No	1. Components and Compositional rule: We construct two words set A,B satisfying \|A\| = \|B\| = 1000... 2) We pretrain the GPT-2 model using different pretraining data schedules. The pretraining data is generated from a subset of composition rules same to those in the downstream task, but with entirely different words.
Dataset Splits	Yes	2. Distribution Split: The support distribution takes the elements in the set {(e1, e2)\|(e1, e2) ∈ a1 × b1 ∪ a2 × b1 ∪ a1 × b2}. The target distribution take elements in the set {(e1, e2)\|(e1, e2) ∈ a2 × b2}. It is easy to verify that these designs satisfy the requirement listed in Section 3.
Hardware Specification	No	The paper mentions using 'GPT-2 model' with specific configurations (4 layers, 4 attention heads, embedding size of 128; or 6 layers, 8 attention heads, embedding size of 256) but does not provide any details about the specific hardware (GPU/CPU models, memory, etc.) used for experiments.
Software Dependencies	No	The paper refers to using the 'GPT-2 model' but does not provide specific version numbers for GPT-2 or any other software dependencies, such as programming languages or libraries.
Experiment Setup	Yes	1) We employ the GPT-2 model with two configurations: Setting 1: 4 layers, 4 attention heads, and an embedding size of 128. Setting 2: 6 layers, 8 attention heads, and an embedding size of 256. 2) We pretrain the GPT-2 model using different pretraining data schedules.