reproducibilityindex.ai

Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

Authors: Paul Soulos, Henry Conklin, Mattia Opper, Paul Smolensky, Jianfeng Gao, Roland Fernandez

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical comparisons between s DTM and various baselines showing s DTM s strong generalization across a wide variety of tasks. ( 5)
Researcher Affiliation	Collaboration	Paul Soulos Johns Hopkins University psoulos1@jhu.edu Henry Conklin University of Edinburgh Mattia Opper University of Edinburgh Paul Smolensky Johns Hopkins University and Microsoft Research Jianfeng Gao Microsoft Research Roland Fernandez Microsoft Research
Pseudocode	No	The paper describes methods textually and with diagrams but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at https://github.com/psoulos/sdtm.
Open Datasets	Yes	Geo Query is a natural language to SQL dataset [77]... SCAN is a synthetic seq2seq task... Active Logical is a tree2tree task... FOR2LAM is a tree2tree program translation task... A.11 Licenses: Geo Query: GLP 2.0 SCAN: BSD Active Logical: Permissive 2.0
Dataset Splits	Yes	IID samples are drawn from a distribution shared with training data. We evaluate several out-of-distribution (OOD) shifts. ... For all datasets, the reported results are the best exact match accuracies on the test set over five random seeds.
Hardware Specification	Yes	All reported s DTM runs could be processed on NVIDIA 16gb V100 GPUs. Depending on availability, we ran some seeds on 80gb H100 GPUs, but this is not necessary.
Software Dependencies	No	The paper mentions using specific models like 'T5 [53]' and 'BERT embeddings [15]' for baselines, and references 'Pytorch implementation' for the Transformer, but it does not provide specific version numbers for general software dependencies or libraries (e.g., Python version, PyTorch version).
Experiment Setup	Yes	When applicable, we adopt the hyperparameters from Soulos et al. [67]. Below we list the newly introduced hyperparameters and changes we made to existing parameters. ... We set the embedding dimension to 64 for FOR2LAM, and 128 for Geo Query and SCAN. We also changed the loss function from mean-squared error to cross entropy. ... This leads to 56 layers for FOR2LAM, 22 layers for Geo Query, and 14 layers for SCAN. Pooling by multi-headed attention 4.2 introduces new hyperparameters such as number of pooling heads and pooling key dimensionality, and we set the value of these to be the same as the Transformer hyperparameters for the agent. Tree pruning 4.3 introduces a new hyperparameter k for the maximum number of nodes to keep. ... For Active Logical we set k = 1024, for FOR2LAM k = 1024, for Geo Query k = 2048, and for SCAN k = 256. With the memory savings from SCT, pooling by multi-headed attention, and pruning, we increase the batch size from 16 to 64. We also increased the agent s model dimension to 256 with 8 heads of attention due to the memory savings except for Active Logical where we matched the original hyperparameters. Random positional embeddings (RPE) also introduce a new hyperparameter for the max input integer, and we set this to be double the max input length. This leads to an RPE hyperparameter of 44 for Geo Query and 18 for SCAN.