Compositional Generalization Across Distributional Shifts with Sparse Tree Operations
Authors: Paul Soulos, Henry Conklin, Mattia Opper, Paul Smolensky, Jianfeng Gao, Roland Fernandez
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical comparisons between s DTM and various baselines showing s DTM s strong generalization across a wide variety of tasks. ( 5) |
| Researcher Affiliation | Collaboration | Paul Soulos Johns Hopkins University psoulos1@jhu.edu Henry Conklin University of Edinburgh Mattia Opper University of Edinburgh Paul Smolensky Johns Hopkins University and Microsoft Research Jianfeng Gao Microsoft Research Roland Fernandez Microsoft Research |
| Pseudocode | No | The paper describes methods textually and with diagrams but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/psoulos/sdtm. |
| Open Datasets | Yes | Geo Query is a natural language to SQL dataset [77]... SCAN is a synthetic seq2seq task... Active Logical is a tree2tree task... FOR2LAM is a tree2tree program translation task... A.11 Licenses: Geo Query: GLP 2.0 SCAN: BSD Active Logical: Permissive 2.0 |
| Dataset Splits | Yes | IID samples are drawn from a distribution shared with training data. We evaluate several out-of-distribution (OOD) shifts. ... For all datasets, the reported results are the best exact match accuracies on the test set over five random seeds. |
| Hardware Specification | Yes | All reported s DTM runs could be processed on NVIDIA 16gb V100 GPUs. Depending on availability, we ran some seeds on 80gb H100 GPUs, but this is not necessary. |
| Software Dependencies | No | The paper mentions using specific models like 'T5 [53]' and 'BERT embeddings [15]' for baselines, and references 'Pytorch implementation' for the Transformer, but it does not provide specific version numbers for general software dependencies or libraries (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | When applicable, we adopt the hyperparameters from Soulos et al. [67]. Below we list the newly introduced hyperparameters and changes we made to existing parameters. ... We set the embedding dimension to 64 for FOR2LAM, and 128 for Geo Query and SCAN. We also changed the loss function from mean-squared error to cross entropy. ... This leads to 56 layers for FOR2LAM, 22 layers for Geo Query, and 14 layers for SCAN. Pooling by multi-headed attention 4.2 introduces new hyperparameters such as number of pooling heads and pooling key dimensionality, and we set the value of these to be the same as the Transformer hyperparameters for the agent. Tree pruning 4.3 introduces a new hyperparameter k for the maximum number of nodes to keep. ... For Active Logical we set k = 1024, for FOR2LAM k = 1024, for Geo Query k = 2048, and for SCAN k = 256. With the memory savings from SCT, pooling by multi-headed attention, and pruning, we increase the batch size from 16 to 64. We also increased the agent s model dimension to 256 with 8 heads of attention due to the memory savings except for Active Logical where we matched the original hyperparameters. Random positional embeddings (RPE) also introduce a new hyperparameter for the max input integer, and we set this to be double the max input length. This leads to an RPE hyperparameter of 44 for Geo Query and 18 for SCAN. |