Trading Information between Latents in Hierarchical Variational Autoencoders

Authors: Tim Z. Xiao, Robert Bamler

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the features of our hierarchical information trading framework, we run large-scale grid searches over a two-dimensional rate space using two different implementations of HVAEs and three different data sets. We trained 441 different HVAEs for each data set/model combination, scanning the rate-hyperparameters (β2, β1) over a 21 × 21 grid ranging from 0.1 to 10 on a log scale in both directions (see Figure 1 on page 2, right panels).
Researcher Affiliation Academia Tim Z. Xiao University of Tübingen & IMPRS-IS zhenzhong.xiao@uni-tuebingen.de Robert Bamler University of Tübingen robert.bamler@uni-tuebingen.de
Pseudocode No The paper does not contain any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes All code necessary to reproduce the results in this paper is available at https://github.com/timxzz/HIT/
Open Datasets Yes We used the SVHN (Netzer et al., 2011) and CIFAR-10 (Krizhevsky, 2009) data sets (both 32 × 32 pixel color images), and MNIST (Le Cun et al., 1998) (28 × 28 binary pixel images).
Dataset Splits No The paper mentions using SVHN, CIFAR-10, and MNIST datasets and refers to a 'training set' and 'labeled test set', but does not provide specific details on the dataset splits (e.g., percentages, sample counts for training, validation, and testing, or specific references to standard splits used). For example, it doesn't specify if a validation set was used for hyperparameter tuning or its size/split.
Hardware Specification Yes Each model took about 2 hours to train on an RTX-2080Ti GPU (≈ 27 hours in total for each data set/model combination using 32 GPUs in parallel).
Software Dependencies No The paper mentions using 'scikit-learn' for classifiers and 'ResNet-18' and 'DenseNet-121' for classifiers, but does not provide specific version numbers for these libraries or any other software dependencies.
Experiment Setup Yes We trained 441 different HVAEs for each data set/model combination, scanning the rate-hyperparameters (β2, β1) over a 21 × 21 grid ranging from 0.1 to 10 on a log scale in both directions. Table 2: Model architecture details for generalized top-down HVAEs (GHVAEs) used in Section 5. Conv and Conv Transp denote the convolutional and transposed convolutional layer, which has the corresponding input: input channel, output channel, kernel size, stride, padding. FC represents fully connected layer. [...] z1 dims: 512 z2 dims: 32 σx = 0.71 Total params: 475811