Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder
Authors: Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In quantitative and in-depth qualitative analysis, CFASL demonstrates a significant improvement of disentanglement in single-factor change, and multi-factor change conditions compared to state-of-the-art methods. |
| Researcher Affiliation | Academia | Hee-Jun Jung EMAIL AI Graduate School Gwangju Institute of Science and Technology (GIST) Jeahyoung Jeong EMAIL AI Graduate School Gwangju Institute of Science and Technology (GIST) Kangil Kim EMAIL AI Graduate School Gwangju Institute of Science and Technology (GIST) |
| Pseudocode | No | The paper describes methods and algorithms textually but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | More details are in README.md file. |
| Open Datasets | Yes | The d Sprites dataset Matthey et al. (2017) consists of 737,280 binary 64 64 images with five independent ground truth factors (number of values), i.e. x-position (32), y-position (32), orientation (40), shape (3), and scale (6). The 3D Cars Reed et al. (2015) dataset consists of 17,568 RGB 64 64 3 images with three independent ground truth factors: elevations(4), azimuth directions(24), and car models(183). The small NORB Le Cun et al. (2004) dataset consists of total 96 96 24,300 grayscale images with four factors, which are category(10), elevation(9), azimuth(18), light(6) and we resize the input as 64 64. 4) The 3D Shapes dataset Burgess & Kim (2018) consists of 480,000 RGB 64 64 3 images with six independent ground truth factors: orientation(15), shape(4), floor color(10), scale(8), object color(10), and wall color(10). 5) The Celeb A dataset Liu et al. (2015) consists of 202,599 images, and we crop the center 128 128 area and then, resize to 64 64 images. |
| Dataset Splits | No | The paper mentions training for a certain number of iterations and evaluating metrics using specific sample counts for evaluation (e.g., "100 samples to evaluate global empirical variance... and run it a total of 800 times to estimate the FVM score"), but it does not provide explicit training, validation, or test dataset splits (e.g., percentages, specific sample counts for each split, or references to predefined splits) for reproducibility. |
| Hardware Specification | Yes | We set the below settings for all experiments in a single Galaxy 2080Ti GPU for 3D Cars and small NORB, and a single Galaxy 3090 for d Sprites 3D Shapes and Celeb A. |
| Software Dependencies | No | The paper mentions various VAE frameworks and metrics (e.g., β-VAE, β-TCVAE, FVM) and refers to related works (e.g., Michlo (2021) for default metric values), but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For common settings to baselines, we set batch size 64, learning rate 1e-4, and random seed from {1, 2, . . . , 10} without weight decay. We train for 3 10^5 iterations on d Sprites small NORB and 3D Cars, 6 10^5 iterations on 3D Shapes, and 10^6 iterations on Celeb A. We set hyper-parameter β {1.0, 2.0, 4.0, 6.0} for β-VAE and β-TCVAE, fix the α, γ for β-TCVAE as 1 Chen et al. (2018). We follow the Control VAE settings Shao et al. (2020), the desired value C {10.0, 12.0, 14.0, 16.0}, and fix the Kp = 0.01, Ki = 0.001. For CLG-VAE, we also follow the settings Zhu et al. (2021) as λhessian = 40.0, λdecomp = 20.0, p = 0.2, and balancing parameter of lossrec group {0.1, 0.2, 0.5, 0.7}. We set the same hyper-parameters of baselines with ϵ {0.1, 0.01}, threshold {0.2, 0.5}, |S| = |SS| = |D|, where |D| is a latent vector dimension. |