Discrete Dictionary-based Decomposition Layer for Structured Representation Learning

Authors: Taewon Park, Hyun-Chul Kim, Minho Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that D3 significantly improves the systematic generalization of various TPR-based models while requiring fewer additional parameters. Notably, D3 outperforms baseline models on the synthetic task that demands the systematic decomposition of unseen combinatorial data.
Researcher Affiliation Collaboration Taewon Park1 Hyun-Chul Kim1 Minho Lee1,2 1Kyungpook National University, South Korea 2ALI Co., Ltd., South Korea
Pseudocode No The paper describes a 'three-step process' but does not present it as formal pseudocode or an algorithm block.
Open Source Code Yes The code of D3 is publicly available at https://github.com/taewonpark/D3
Open Datasets Yes The SAR task [23] evaluates systematic generalization in memorizing and recalling combinatorial data. The sys-b Ab I task [23] is a variant of the b Ab I task [42] designed to evaluate systematic generalization in text understanding and reasoning. The sort-of-CLEVR task [26] evaluates compositional generalization in visual relational reasoning. The Wiki Text-103 task [19] is a language modeling dataset consisting of lengthy corpora from Wikipedia.
Dataset Splits Yes The sys-b Ab I task uses the en-valid-10k version, which is already divided into training, validation, and test datasets. The Wiki Text-103 task comprises 28,475 articles for training, 60 for validation, and 60 for testing.
Hardware Specification Yes Each experiment was conducted on a single 48GB NVIDIA RTX A6000 GPU and an AMD EPYC 7513 32-Core Processor.
Software Dependencies No The paper mentions general software like 'Adam optimizer' but does not provide specific version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We train the model using the Adam optimizer with a batch size of 64 and a learning rate of 1e 3, β1 of 0.9, and β2 of 0.98 for training iterations of 30K.