Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Statistical Mechanisms of Distributional Compositional Generalization
Authors: Jingwen Fu, Nanning Zheng
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiment... 6.2. Experiemnts on trade-off and non-trade-off improvement... 6.3. Experiments on Generalization Bounds... Table 1. Values of IA,β( T = T, PS) and GACC over 10 instances... Table 2. Performance comparison across rule complexities |
| Researcher Affiliation | Academia | Jingwen Fu 1 Nanning Zheng 1... 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University. Correspondence to: Nanning Zheng <EMAIL>. |
| Pseudocode | No | The paper describes methods and analyses using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | No | 1. Components and Compositional rule: We construct two words set A,B satisfying |A| = |B| = 1000... 2) We pretrain the GPT-2 model using different pretraining data schedules. The pretraining data is generated from a subset of composition rules same to those in the downstream task, but with entirely different words. |
| Dataset Splits | Yes | 2. Distribution Split: The support distribution takes the elements in the set {(e1, e2)|(e1, e2) ∈ a1 × b1 ∪ a2 × b1 ∪ a1 × b2}. The target distribution take elements in the set {(e1, e2)|(e1, e2) ∈ a2 × b2}. It is easy to verify that these designs satisfy the requirement listed in Section 3. |
| Hardware Specification | No | The paper mentions using 'GPT-2 model' with specific configurations (4 layers, 4 attention heads, embedding size of 128; or 6 layers, 8 attention heads, embedding size of 256) but does not provide any details about the specific hardware (GPU/CPU models, memory, etc.) used for experiments. |
| Software Dependencies | No | The paper refers to using the 'GPT-2 model' but does not provide specific version numbers for GPT-2 or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | 1) We employ the GPT-2 model with two configurations: Setting 1: 4 layers, 4 attention heads, and an embedding size of 128. Setting 2: 6 layers, 8 attention heads, and an embedding size of 256. 2) We pretrain the GPT-2 model using different pretraining data schedules. |