Unsupervised Image Representation Learning with Deep Latent Particles
Authors: Tal Daniel, Aviv Tamar
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our DLP representations are useful for downstream tasks such as unsupervised keypoint (KP) detection, image manipulation, and video prediction for scenes composed of multiple dynamic objects. In addition, we show that our probabilistic interpretation of the problem naturally provides uncertainty estimates for particle locations, which can be used for model selection, among other tasks. Videos and code are available: https://taldatech.github.io/ deep-latent-particles-web/. 5. Experiments Our method produces latent particles to represent an input image. We design our set of experiments to answer the following questions: |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Tal Daniel <taldanielm@campus.technion.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Stitching Algorithm |
| Open Source Code | Yes | Videos and code are available: https://taldatech.github.io/ deep-latent-particles-web/. ... Our code is available publicly5. 5https://github.com/taldatech/ deep-latent-particles-pytorch |
| Open Datasets | Yes | The common benchmark for this task uses the Celeb A train set while excluding the MAFL (Zhang et al., 2014b) subset which includes annotations for 5 facial landmarks eyes, nose and mouth corners. ... The CLEVRER dataset is composed of 5-second (128 frames) video of rigid objects colliding, where each frame can contain up to 8 objects of various shapes and colors. ... CLEVRER (Yi et al., 2019): this dataset contains 20,000 synthetic videos of moving and colliding objects, separated to 10,000 train video, 5,000 validation videos and 5,000 test videos... |
| Dataset Splits | Yes | CLEVRER (Yi et al., 2019): this dataset contains 20,000 synthetic videos of moving and colliding objects, separated to 10,000 train video, 5,000 validation videos and 5,000 test videos... |
| Hardware Specification | No | The paper mentions running experiments using GPUs: 'a batch size of 32 per GPU (we used 1 to 4 GPUs).' However, it does not specify any particular GPU models, CPU models, memory, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions that the model 'is implemented in Py Torch (Paszke et al., 2017)' and that 'The GCN layes are implemented efficiently with the torch geometric library (Fey & Lenssen, 2019)'. While it names software, it does not provide specific version numbers for PyTorch or torch_geometric, which are required for reproducibility. |
| Experiment Setup | Yes | The training procedure is based on maximizing the ELBO (1). Since all of our distributions are Gaussians, the KL divergence has a closed form solution. Similar to the β-VAE (Higgins et al., 2017), we multiply the KL and Chamfer-KL terms in the loss by hyperparameters βKL and βCKL, respectively. ... The model is optimized end-to-end by Adam (Kingma & Ba, 2014), using the reparametrization trick... Appendix D. Detailed hyperparameters used for the various experiments in the paper. ... all models were trained with an initial learning rate of 2e 4 and a a batch size of 32 per GPU (we used 1 to 4 GPUs). For all datasets, we used a multi-step learning rate scheduler with the following milestones (in epochs): [30, 60] with learning rate decreasing by 0.5 on each milestone. The warm-up stage described in Appendix B was only used for the Object model, where we used 1 warm-up epoch for CLEVRER and 2 for Traffic, and the number of noisy masks epochs was 5 times the warm-up epochs 5 and 10 for CLEVRER and Traffic, respectively. |