Discrete Dictionary-based Decomposition Layer for Structured Representation Learning
Authors: Taewon Park, Hyun-Chul Kim, Minho Lee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that D3 significantly improves the systematic generalization of various TPR-based models while requiring fewer additional parameters. Notably, D3 outperforms baseline models on the synthetic task that demands the systematic decomposition of unseen combinatorial data. |
| Researcher Affiliation | Collaboration | Taewon Park1 Hyun-Chul Kim1 Minho Lee1,2 1Kyungpook National University, South Korea 2ALI Co., Ltd., South Korea |
| Pseudocode | No | The paper describes a 'three-step process' but does not present it as formal pseudocode or an algorithm block. |
| Open Source Code | Yes | The code of D3 is publicly available at https://github.com/taewonpark/D3 |
| Open Datasets | Yes | The SAR task [23] evaluates systematic generalization in memorizing and recalling combinatorial data. The sys-b Ab I task [23] is a variant of the b Ab I task [42] designed to evaluate systematic generalization in text understanding and reasoning. The sort-of-CLEVR task [26] evaluates compositional generalization in visual relational reasoning. The Wiki Text-103 task [19] is a language modeling dataset consisting of lengthy corpora from Wikipedia. |
| Dataset Splits | Yes | The sys-b Ab I task uses the en-valid-10k version, which is already divided into training, validation, and test datasets. The Wiki Text-103 task comprises 28,475 articles for training, 60 for validation, and 60 for testing. |
| Hardware Specification | Yes | Each experiment was conducted on a single 48GB NVIDIA RTX A6000 GPU and an AMD EPYC 7513 32-Core Processor. |
| Software Dependencies | No | The paper mentions general software like 'Adam optimizer' but does not provide specific version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train the model using the Adam optimizer with a batch size of 64 and a learning rate of 1e 3, β1 of 0.9, and β2 of 0.98 for training iterations of 30K. |