Improving Compositional Generalization using Iterated Learning and Simplicial Embeddings
Authors: Yi Ren, Samuel Lavoie, Michael Galkin, Danica J. Sutherland, Aaron C. Courville
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this combination of changes improves compositional generalization over other approaches, demonstrating these improvements both on vision tasks with well-understood latent factors and on real molecular graph prediction tasks where the latent structure is unknown. |
| Researcher Affiliation | Collaboration | Yi Ren University of British Columbia Samuel Lavoie Université de Montréal & Mila Mikhail Galkin Intel AI Lab Danica J. Sutherland University of British Columbia & Amii Aaron Courville Université de Montréal & Mila |
| Pseudocode | Yes | Pseudocode for the proposed method, SEM-IL, is in the appendix (Algorithm 1). |
| Open Source Code | No | The paper mentions using 'open-source code released by OGB [37]' for its backbone, but does not state that the authors' own implementation or code for their methodology is released or publicly available. |
| Open Datasets | Yes | We conduct experiments on three common molecular graph property datasets: ogbg-molhiv (1 binary classification task), ogbg-molpcba (128 binary classification tasks), and PCQM4Mv2 (1 regression task); all three come from the Open Graph Benchmark [37]. We conduct experiments on three vision datasets, i.e., d Sprites [52], 3d Shapes [9], MPI3D-real [23], where the ground truth G are given. |
| Dataset Splits | Yes | For PCQM, we report the validation performance, as the test set is private and inaccessible. In the experiments, we use the validation split of molhiv as Dtrain and the test split as Dtest, each of which contain 4,113 distinct molecules unseen during the training of z. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'open-source code released by OGB [37]' and the 'RDKit tool [44]', but it does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The networks are optimized using a standard SGD optimizer with a learning rate of 10 3 and a weight decay rate of 5 10 4. For the backbone structure, the depth of the GCN/GIN is 5, hidden embedding is 300, the pooling method is taking the mean, etc. For the training on downstream tasks, we use the Adam W [49] optimizer with a learning rate of 10 3, and use a cosine decay scheduler to stable the training. For the SEM layer, we search L from [10, 200] and V from [5, 100] on the validation set. For the IL-related methods, we select the imitation steps from {1,000; 5,000; 10,000; 50,000; 100,000}. |