Vector Quantization-Based Regularization for Autoencoders
Authors: Hanwei Wu, Markus Flierl6380-6387
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our proposed regularization method results in improved latent representations for both supervised learning and clustering downstream tasks when compared to autoencoders using other bottleneck structures. ... We test our proposed model on datasets MNIST, SVHN and CIFAR-10. |
| Researcher Affiliation | Academia | 1KTH Royal Institute of Technology, Stockholm, Sweden 2Research Institutes of Sweden Stockholm, Sweden |
| Pseudocode | No | The paper includes 'Figure 2: Description of the soft VQ-VAE' which is a diagram, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of the paper is publicly available.1 https://github.com/Albert Oh90/Soft-VQ-VAE/ |
| Open Datasets | Yes | We test our proposed model on datasets MNIST, SVHN and CIFAR-10. |
| Dataset Splits | No | The paper mentions 'Early stopping at 10000 iterations is applied by soft VQ-VAE on SVHN and CIFAR-10 datasets,' which implies the use of a validation set, but it does not provide specific split percentages or sample counts for validation data. Only 'training set' and 'test set' are explicitly mentioned for the main splits, without quantitative details for a three-way split. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'Glorot uniform initializer' but does not specify any software libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0'). |
| Experiment Setup | Yes | For the models tested on the CIFAR-10 and SVHN datasets, the encoder consists of 4 convolutional layers with stride 2 and filter size 3 3. The number of channels is doubled for each encoder layer. The number of channels of the first layer is set to be 64. The decoder follows a symmetric structure of the encoder. For MINST dataset, we use multilayer perceptron networks (MLP) to construct the autoencoder. The dimensions of dense layers of the encoder and decoder are D500-500-2000-d and d-2000-500-500-D respectively, where d is the dimension of the learned latents and D is the dimension of the input datapoints. All the layers use rectified linear units (Re LU) as activation functions. We use the Glorot uniform initializer (Glorot and Bengio 2010) for the weights of encoder-decoder networks. The codebook is initialized by the uniform unit scaling. All models are trained using Adam optimizer (Kingma and Ba 2015) with learning rate 3e-4 and evaluate the performance after 40000 iterations with batch size 32. Early stopping at 10000 iterations is applied by soft VQ-VAE on SVHN and CIFAR-10 datasets. |