IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis

Authors: Huaibo Huang, zhihang li, Ran He, Zhenan Sun, Tieniu Tan

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our method produces high-resolution photo-realistic images (e.g., CELEBA images at 10242), which are comparable to or better than the state-of-the-art GANs.
Researcher Affiliation Academia 1School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 2Center for Research on Intelligent Perception and Computing, CASIA, Beijing, China 3National Laboratory of Pattern Recognition, CASIA, Beijing, China 4Center for Excellence in Brain Science and Intelligence Technology, CAS, Beijing, China
Pseudocode Yes As illustrated in Algorithm 1, the inference and generator models are trained iteratively by updating E using LE to distinguish the real data X and generated samples, Xr and Xp, and then updating G using LG to generate samples that are increasingly similar to the real data; these steps are repeated until convergence. (Algorithm 1 is provided on page 5).
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We condider three data sets, namely Celeb A [26] , Celeb A-HQ [18] and LSUN BEDROOM [40].
Dataset Splits Yes The Celeb A dataset consists of 202,599 celebrity images with large variations in facial attributes. Following the standard protocol of Celeb A, we use 162,770 images for training, 19,867 for validation and 19,962 for testing. The Celeb A-HQ dataset is a high-quality version of Celeb A that consists of 30,000 images at 1024 1024 resolution. The dataset is split into two sets: the first 29,000 images as the training set and the rest 1,000 images as the testing set.
Hardware Specification No The paper mentions 'the hardware limits the minibatch size for high-resolutions' but does not provide specific hardware details like GPU/CPU models or processor types.
Software Dependencies No The paper mentions using 'Adam algorithm' but does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes For the images at 1024 1024, the dimension of the the latent code is set to be 512 and the hyperparameters in Eq. (11) and Eq. (12) are set empirically to hold the training balance of the inference and generator models: m = 90 , α = 0.25 and β = 0.0025. For the images at 256 256, the latent dimension is 512, m = 120 , α = 0.25 and β = 0.05. For the images at 128 128, the latent dimension is 256, m = 110 , α = 0.25 and β = 0.5. ... the inference and generator models are trained iteratively using Adam algorithm [19] (β1 = 0.9, β2 = 0.999) with a batch size of 8 and a fixed learning rate of 0.0002.