Indirectly Parameterized Concrete Autoencoders

Authors: Alfred Nilsson, Klas Wijk, Sai Bharath Chandra Gutha, Erik Englesson, Alexandra Hotti, Carlo Saccardi, Oskar Kviman, Jens Lagergren, Ricardo Vinuesa Motilva, Hossein Azizpour

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously verify their empirical effectiveness. (Abstract) In this section, we evaluate our proposed method on several datasets. Table 4 in Appendix A provides an overview of the datasets used. Across all experiments, we perform an ablation of the standard CAE, only IP, only the GJSD term, and both IP and the GJSD term. (Section 4)
Researcher Affiliation Collaboration 1KTH Royal Institute of Technology, Stockholm, Sweden 2Science for Life Laboratory, Solna, Sweden 3Swedish e-Science Research Centre (Se RC), Stockholm, Sweden 4Klarna, Stockholm, Sweden.
Pseudocode No The paper describes its methods using text and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/ Alfred-N/IP-CAE. (Appendix A.2)
Open Datasets Yes MNIST and MNIST-Fashion (Le Cun et al., 1998; Xiao et al., 2017) ... ISOLET (Fanty & Cole, 1990) ... COIL-20 (Nene et al., 1996) ... Smartphone Dataset for Human Activity Recognition (Anguita et al., 2013) ... Mice Protein Expression (Higuera et al., 2015) (Section 4) For MNIST and Fashion-MNIST, we use the versions provided in Torchvision. For the other datasets, ISOLET, Smartphone HAR, and Mice protein we use the version provided at UCI (Fanty & Cole, 1990; Anguita et al., 2013; Higuera et al., 2015). (Appendix A.4)
Dataset Splits Yes To further facilitate reproducibility, we include a script, src/fs_datasets.py, for downloading all datasets used in this paper, which includes functions that return the exact train/test/validation splits that were used. (Appendix A.2) We train every model for 200 epochs, and select the weights corresponding to the best validation loss for test set evaluation. (Section 4)
Hardware Specification Yes We used an external cluster with T4 and A40 GPUs. Each model was trained on a single GPU. (Appendix A.5)
Software Dependencies No The paper mentions "python" and "Torchvision" but does not provide specific version numbers for these or other software dependencies used in the experiments.
Experiment Setup Yes Following CAE, we use a fixed learning rate of 0.001 with the Adam optimizer with moving-average coefficients β = (0.9, 0.999) and no weight decay, for all experiments and datasets. We train every model for 200 epochs, and select the weights corresponding to the best validation loss for test set evaluation. In all experiments, unless otherwise specified, we use an MLP with one hidden layer of 200 nodes for the decoder network. For the hidden activation, we use Leaky ReLU with a slope of 0.2. For all experiments, we perform 10 repetitions and report the mean quantity. (Section 4)