reproducibilityindex.ai

Differentiable Tree Operations Promote Compositional Generalization

Authors: Paul Soulos, Edward J Hu, Kate Mccurdy, Yunmo Chen, Roland Fernandez, Paul Smolensky, Jianfeng Gao

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our proposal empirically on a series of synthetic tree-to-tree datasets that test a model s ability to generalize compositionally ( 5).
Researcher Affiliation	Collaboration	1Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA 2Mila, Universit e de Montreal, Montreal, CA 3School of Informatics, University of Edinburgh, Edinburgh, UK 4Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA 5Microsoft Research, Redmond, WA, USA.
Pseudocode	No	The paper describes the model and operations but does not include a formal pseudocode block or algorithm.
Open Source Code	Yes	Code available at https://github.com/psoulos/dtm.
Open Datasets	Yes	Data available at https://huggingface.co/datasets/rfernand/basic_sentence_transforms.
Dataset Splits	Yes	Each task in the dataset has five splits: train, validation, test, out-of-distribution lexical (OOD-lexical), and out-of-distribution structural (OOD-structural). The train split has 10,000 samples, while the other splits have 1,250 samples each.
Hardware Specification	Yes	All of our models were trained on 1x V100 (16GB) virtual machines.
Software Dependencies	No	The paper mentions software components like "Optimizer: Adam" and "Transformer non-linearity: gelu", but does not provide specific version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow versions).
Experiment Setup	Yes	For the DTM models, we ran a 3x hyperparameter grid search over the following ranges. The best performing hyperparameter values are marked in bold. Computation Steps: [X+2, (X+2)*2] where X is the minimum number of steps required to complete a task weight decay: [.1, .01] Transformer model dimension: [32, 64] Adam β2: [.98, .95] Transformer dropout: [0, .1]. The following hyperparameters were set for all models lr warmup: [10000] lr decay: [cosine] training steps: [20000] Transformer encoder layers per computation step: [1] Transformer # of heads: [4] Batch size: [16].