Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models
Authors: Baao Xie, Qiuyu Chen, Yunnan Wang, Zequn Zhang, Xin Jin, Wenjun Zeng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate our method s superior performance in disentanglement and reconstruction. Furthermore, the model inherits enhanced interpretability and generalizability from MLLMs. |
| Researcher Affiliation | Academia | Baao Xie1 Qiuyu Chen1,2 Yunnan Wang1,2 Zequn Zhang1,3 Xin Jin1,* Wenjun Zeng1 1Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China 2Shanghai Jiao Tong University, Shanghai, China 3 University of Science and Technology of China, Hefei, China |
| Pseudocode | Yes | Figure 5: Overall training algorithm of GEM. |
| Open Source Code | No | Code is available at here. (in footnote) and The codes will be released in the camera-ready version, with detailed instructions for user to reproduce. (in checklist). |
| Open Datasets | Yes | We evaluate the GEM on two datasets: 1) Celeb A [75] contains over 200,000 high-quality facial images. ... 2) LSUN [76] consists of about one million images across various object categories such as cars, buildings, animals, etc. |
| Dataset Splits | Yes | Commonly, we use employ the entirety of the Celeb A dataset, which includes 162,770 images for training, 19,867 for validation, and 19,962 for testing. |
| Hardware Specification | Yes | All the experiments are processed using the Adam optimizer with a learning rate of 1e-4, and conducted on the Nvidia Tesla A100 GPUs, with a batch size of 32. |
| Software Dependencies | No | The paper mentions 'Py Torch [77]' and 'GPT-4o [62]' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For every experiment, the latent dimension size is set to 6. Concentrating on the disentanglement capacity of the framework, all experimental images are resized to a resolution of 64 × 64 to minimize computational resources. ... All the experiments are processed using the Adam optimizer with a learning rate of 1e-4, and conducted on the Nvidia Tesla A100 GPUs, with a batch size of 32. ... λadv, λdis and λgem serve as hyperparameters to balance the disentanglement capability and reconstruction quality, with default values set to 0.8, 0.6 and 0.6, respectively. |