Disentanglement Learning via Topology

Authors: Nikita Balabin, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, Serguei Barannikov

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments have shown that the proposed Top Dis loss improves disentanglement scores such as MIG, Factor VAE score, SAP score, and DCI disentanglement score with respect to state-of-the-art results while preserving the reconstruction quality. Our method works in an unsupervised manner, permitting us to apply it to problems without labeled factors of variation. The Top Dis loss works even when factors of variation are correlated. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.
Researcher Affiliation Collaboration 1Skolkovo Institute of Science and Technology, Moscow, Russia 2AIRI, Moscow, Russia 3CNRS, IMJ, Paris Cit e University
Pseudocode Yes Algorithm 1 Latent traversal with a shift in the latent space. and Algorithm 2 The Top Dis loss.
Open Source Code Yes We release out code: https://github.com/nikitabalabin/Top Dis
Open Datasets Yes Datasets. We used popular benchmarks: d Sprites (Matthey et al., 2017), 3D Shapes (Burgess & Kim, 2018), 3D Faces (Paysan et al., 2009), MPI 3D (Gondal et al., 2019), Celeb A (Liu et al., 2015).
Dataset Splits No The paper mentions training details like 'trained 1M iterations with batch size of 64' but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for their experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions the use of 'Adam (Kingma & Ba, 2015) optimizer' but does not specify version numbers for other key software components or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We normalized the data to [0, 1] interval and trained 1M iterations with batch size of 64 and Adam (Kingma & Ba, 2015) optimizer. The learning rate for VAE updates was 10^-4 for d Sprites and MPI 3D datasets, 10^-3 for 3D Shapes dataset, and 2x10^-4 for 3D faces and Celeb A datasets, β1 = 0.9, β2 = 0.999, while the learning rate for discriminator updates was 10^-4 for d Sprites, 3D Faces, MPI 3D and Celeb A datasets, 10^-3 for 3D Shapes dataset, β1 = 0.5, β2 = 0.9 for discriminator updates.