Transferring disentangled representations: bridging the gap between synthetic and real images
Authors: Jacopo Dapueto, Nicoletta Noceti, Francesca Odone
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide an extensive empirical study to address these issues. In addition, we propose a new interpretable intervention-based metric, to measure the quality of factors encoding in the representation. Our results indicate that some level of disentanglement, transferring a representation from synthetic to real data, is possible and effective.The paper presents three main contributions: (1) a novel metric to assess the quality of disentanglement, which is interpretable, classifier-free and informative on the structure of the latent representation; (2) a DR transfer methodology to Target datasets without Fo V annotation; (3) an extensive experimental analysis that considers different (Source, Target) pairs and quantitatively assesses the expressiveness of the learnt DR on Target of different nature (including the case where the gap between Source and Target is large), taking into consideration the main expected properties of disentangled representation. |
| Researcher Affiliation | Academia | Jacopo Dapueto Nicoletta Noceti Francesca Odone jacopo.dapueto@edu.unige.it {nicoletta.noceti,francesca.odone}@unige.it Ma LGa-DIBRIS, Università degli studi di Genova, Genova, Italy |
| Pseudocode | Yes | Algorithm 1 Compute association matrix S between dimensions and Fo Vs, Algorithm 2 Overlap score of Fo V j: OS, Algorithm 3 Encoding score of Fo V j: MES |
| Open Source Code | Yes | We provide the code as supplementary material, while the datasets we used are all publicly available. |
| Open Datasets | Yes | d Sprites[44] is a dataset of 2D shapes generated from 5 ground truth Fo Vs: Shape, Scale, Rotation, x and y Positions. Variants of the dataset have been proposed: in Noisy-d Sprites the background is filled with uniform noise; Color-d Sprites includes Color as an additional Fo V; Noisy-Color-d Sprites adds uniform noise to the latter. We refer to them as: N-d Sprites, C-d Sprites and N-C-d Sprites. Shapes3D [4] is a dataset of 3D shapes, generated from 6 ground truth Fo Vs: Floor colour, Wall colour, Object colour, Scale, Shape and Orientation. It is characterized by the presence of Occlusions. Isaac3D [47] is a synthetic dataset of a 3D scene of a kitchen where a robot arm is holding objects in a variety of configurations. It is characterized by 9 real-world complex Fo Vs, including robot movements, camera height, environmental conditions (e.g. lighting).Coil is derived from Coil100 [46].RGBD-Objects [33] is a dataset of 300 common household objects acquired by a RGB-D camera. |
| Dataset Splits | No | Each GBT and MLP classifiers are trained on the latent representation r extracted from the Encoder of the Ada-GVAE models so that the train split comprises 10000 samples and the test split is of 5000 samples. |
| Hardware Specification | Yes | All the experiments have been executed with an NVIDIA Quadro RTX 6000. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not provide specific version numbers for it or any other key software libraries/dependencies (e.g., Python, PyTorch, or CUDA versions) required for reproduction. |
| Experiment Setup | Yes | We adopted the Adam optimizer [29] with default parameters, batch size=64 and 400k steps. We used linear deterministic warm-up [11, 61, 3] over the first 50k training steps. We maintained the latent dimension fixed to 10 for all the experiments. |