Vector Quantization-Based Regularization for Autoencoders

Authors: Hanwei Wu, Markus Flierl6380-6387

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures. ... We test our proposed model on datasets MNIST, SVHN and CIFAR-10.
Researcher Affiliation Academia 1KTH Royal Institute of Technology, Stockholm, Sweden 2Research Institutes of Sweden Stockholm, Sweden
Pseudocode No The paper includes 'Figure 2: Description of the soft VQ-VAE' which is a diagram, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of the paper is publicly available.1 https://github.com/Albert Oh90/Soft-VQ-VAE/
Open Datasets Yes We test our proposed model on datasets MNIST, SVHN and CIFAR-10.
Dataset Splits No The paper mentions 'Early stopping at 10000 iterations is applied by soft VQ-VAE on SVHN and CIFAR-10 datasets,' which implies the use of a validation set, but it does not provide specific split percentages or sample counts for validation data. Only 'training set' and 'test set' are explicitly mentioned for the main splits, without quantitative details for a three-way split.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'Glorot uniform initializer' but does not specify any software libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0').
Experiment Setup Yes For the models tested on the CIFAR-10 and SVHN datasets, the encoder consists of 4 convolutional layers with stride 2 and filter size 3 3. The number of channels is doubled for each encoder layer. The number of channels of the first layer is set to be 64. The decoder follows a symmetric structure of the encoder. For MINST dataset, we use multilayer perceptron networks (MLP) to construct the autoencoder. The dimensions of dense layers of the encoder and decoder are D500-500-2000-d and d-2000-500-500-D respectively, where d is the dimension of the learned latents and D is the dimension of the input datapoints. All the layers use rectified linear units (Re LU) as activation functions. We use the Glorot uniform initializer (Glorot and Bengio 2010) for the weights of encoder-decoder networks. The codebook is initialized by the uniform unit scaling. All models are trained using Adam optimizer (Kingma and Ba 2015) with learning rate 3e-4 and evaluate the performance after 40000 iterations with batch size 32. Early stopping at 10000 iterations is applied by soft VQ-VAE on SVHN and CIFAR-10 datasets.