reproducibilityindex.ai

Finite Scalar Quantization: VQ-VAE Made Simple

Authors: Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations. We start with a study, where we train Mask GIT models on lower resolution 128x128 Image Net images and for shorter time compared to the paper Chang et al. (2022)
Researcher Affiliation	Industry	Fabian Mentzer1, David Minnen1, Eirikur Agustsson1, Michael Tschannen2, 1Google Research 2Google Deep Mind
Pseudocode	No	The paper refers to 'code in App. A.1' for a specific function, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks within the document.
Open Source Code	Yes	Colab on Git Hub. We refer to Section A.1 for reference code.
Open Datasets	Yes	We start with a study, where we train Mask GIT models on lower resolution 128x128 Image Net images and for shorter time compared to the paper Chang et al. (2022) (100 epochs for Stage I, 200 epochs for Stage II. Please see Appendix A.4.1 for more hyperparameters). We train Mask GIT models on Image Net 256 based on the public Git Hub code, training Stage I for 1M steps with batch size 512, and Stage II for 2.5M steps with batch size 256. We retrain the public UVi M Git Hub code for all three tasks (panoptic segmentation, depth estimation, colorization).
Dataset Splits	Yes	Reconstruction FID, the FID obtained by the GAN-trained autoencoder when the 50k validation images are fed through the quantized autoencoder.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or number of machines) used to run the experiments.
Software Dependencies	No	The paper mentions the use of 'ADM Tensor Flow Suite' and 'JAX' in its references, and 'public Git Hub code' for Mask GIT and UVi M, but it does not specify concrete version numbers for any key software components or libraries (e.g., TensorFlow version, PyTorch version, Python version).
Experiment Setup	Yes	We start with a study, where we train Mask GIT models on lower resolution 128x128 Image Net images and for shorter time compared to the paper Chang et al. (2022) (100 epochs for Stage I, 200 epochs for Stage II. Please see Appendix A.4.1 for more hyperparameters).