Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentiable Tree Operations Promote Compositional Generalization
Authors: Paul Soulos, Edward J Hu, Kate Mccurdy, Yunmo Chen, Roland Fernandez, Paul Smolensky, Jianfeng Gao
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our proposal empirically on a series of synthetic tree-to-tree datasets that test a model s ability to generalize compositionally ( 5). |
| Researcher Affiliation | Collaboration | 1Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA 2Mila, Universit e de Montreal, Montreal, CA 3School of Informatics, University of Edinburgh, Edinburgh, UK 4Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA 5Microsoft Research, Redmond, WA, USA. |
| Pseudocode | No | The paper describes the model and operations but does not include a formal pseudocode block or algorithm. |
| Open Source Code | Yes | Code available at https://github.com/psoulos/dtm. |
| Open Datasets | Yes | Data available at https://huggingface.co/datasets/rfernand/basic_sentence_transforms. |
| Dataset Splits | Yes | Each task in the dataset has five splits: train, validation, test, out-of-distribution lexical (OOD-lexical), and out-of-distribution structural (OOD-structural). The train split has 10,000 samples, while the other splits have 1,250 samples each. |
| Hardware Specification | Yes | All of our models were trained on 1x V100 (16GB) virtual machines. |
| Software Dependencies | No | The paper mentions software components like "Optimizer: Adam" and "Transformer non-linearity: gelu", but does not provide specific version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For the DTM models, we ran a 3x hyperparameter grid search over the following ranges. The best performing hyperparameter values are marked in bold. Computation Steps: [X+2, (X+2)*2] where X is the minimum number of steps required to complete a task weight decay: [.1, .01] Transformer model dimension: [32, 64] Adam β2: [.98, .95] Transformer dropout: [0, .1]. The following hyperparameters were set for all models lr warmup: [10000] lr decay: [cosine] training steps: [20000] Transformer encoder layers per computation step: [1] Transformer # of heads: [4] Batch size: [16]. |