Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation
Authors: Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data. and 4 Experiments We now present empirical validations of the theoretical results presented earlier. To achieve this, we compare the ability of additive and non-additive decoders to both identify ground-truth latent factors (Theorems 1 & 2) and extrapolate (Corollary 3) when trained to solve the reconstruction task on simple images (64 64 3) consisting of two balls moving in space [2]. See Appendix B.1 for training details. We consider two datasets: one where the two ball positions can only vary along the y-axis (Scalar Latents) and one where the positions can vary along both the x and y axes (Block Latents). |
| Researcher Affiliation | Collaboration | S ebastien Lachapelle ,1 Divyat Mahajan Ioannis Mitliagkas Simon Lacoste-Julien ,1 Mila & DIRO, Universit e de Montr eal 1Samsung SAIT AI Lab, Montreal Equal contribution. Canada CIFAR AI Chair. Correspondence to: {lachaseb, divyat.mahajan}@mila.quebec |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: Our code repository can be found at this link. |
| Open Datasets | Yes | We use the moving balls environment from Ahuja et al. [2] with images of dimension 64 64 3, with latent vector (z) representing the position coordinates of each balls. |
| Dataset Splits | Yes | We use 50k samples for the test dataset, while we use 20k samples for the train dataset along with 5k samples (25% of the train sample size) for the validation dataset. |
| Hardware Specification | No | The experiments were in part enabled by computational resources provided by Calcul Qu ebec (calculquebec.ca) and the Digital Research Alliance of Canada (alliancecan.ca). This statement refers to general computational resources without specifying any particular hardware components like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions implementing in JAX [6] in Appendix A.7 and B.1 but does not specify version numbers for JAX or any other software dependencies. |
| Experiment Setup | Yes | Hyperparameters. For both the Scalar Latents and the Block Latents dataset, we used the Adam optimizer with the hyperparameters defined below. Note that we maintain consistent hyperparameters across both the Additive decoder and the Non-Additive decoder method. Scalar Latents Dataset. Batch Size: 64 Learning Rate: 1 10 3 Weight Decay: 5 10 4 Total Epochs: 4000 Block Latents Dataset. Batch Size: 1024 Learning Rate: 1 10 3 Weight Decay: 5 10 4 Total Epochs: 6000 Model Architecture. We use the following architectures for Encoder and Decoder across both the datasets (Scalar Latents, Block Latents). |