Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On the Statistical Mechanisms of Distributional Compositional Generalization
Authors: Jingwen Fu, Nanning Zheng
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiment... 6.2. Experiemnts on trade-off and non-trade-off improvement... 6.3. Experiments on Generalization Bounds... Table 1. Values of IA,β( T = T, PS) and GACC over 10 instances... Table 2. Performance comparison across rule complexities |
| Researcher Affiliation | Academia | Jingwen Fu 1 Nanning Zheng 1... 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University. Correspondence to: Nanning Zheng <EMAIL>. |
| Pseudocode | No | The paper describes methods and analyses using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide links to a code repository. |
| Open Datasets | No | 1. Components and Compositional rule: We construct two words set A,B satisfying |A| = |B| = 1000... 2) We pretrain the GPT-2 model using different pretraining data schedules. The pretraining data is generated from a subset of composition rules same to those in the downstream task, but with entirely different words. |
| Dataset Splits | Yes | 2. Distribution Split: The support distribution takes the elements in the set {(e1, e2)|(e1, e2) ∈ a1 × b1 ∪ a2 × b1 ∪ a1 × b2}. The target distribution take elements in the set {(e1, e2)|(e1, e2) ∈ a2 × b2}. It is easy to verify that these designs satisfy the requirement listed in Section 3. |
| Hardware Specification | No | The paper mentions using 'GPT-2 model' with specific configurations (4 layers, 4 attention heads, embedding size of 128; or 6 layers, 8 attention heads, embedding size of 256) but does not provide any details about the specific hardware (GPU/CPU models, memory, etc.) used for experiments. |
| Software Dependencies | No | The paper refers to using the 'GPT-2 model' but does not provide specific version numbers for GPT-2 or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | 1) We employ the GPT-2 model with two configurations: Setting 1: 4 layers, 4 attention heads, and an embedding size of 128. Setting 2: 6 layers, 8 attention heads, and an embedding size of 256. 2) We pretrain the GPT-2 model using different pretraining data schedules. |