Unsupervised Image Representation Learning with Deep Latent Particles

Authors: Tal Daniel, Aviv Tamar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our DLP representations are useful for downstream tasks such as unsupervised keypoint (KP) detection, image manipulation, and video prediction for scenes composed of multiple dynamic objects. In addition, we show that our probabilistic interpretation of the problem naturally provides uncertainty estimates for particle locations, which can be used for model selection, among other tasks. Videos and code are available: https://taldatech.github.io/ deep-latent-particles-web/. 5. Experiments Our method produces latent particles to represent an input image. We design our set of experiments to answer the following questions:
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Tal Daniel <taldanielm@campus.technion.ac.il>.
Pseudocode Yes Algorithm 1 Stitching Algorithm
Open Source Code Yes Videos and code are available: https://taldatech.github.io/ deep-latent-particles-web/. ... Our code is available publicly5. 5https://github.com/taldatech/ deep-latent-particles-pytorch
Open Datasets Yes The common benchmark for this task uses the Celeb A train set while excluding the MAFL (Zhang et al., 2014b) subset which includes annotations for 5 facial landmarks eyes, nose and mouth corners. ... The CLEVRER dataset is composed of 5-second (128 frames) video of rigid objects colliding, where each frame can contain up to 8 objects of various shapes and colors. ... CLEVRER (Yi et al., 2019): this dataset contains 20,000 synthetic videos of moving and colliding objects, separated to 10,000 train video, 5,000 validation videos and 5,000 test videos...
Dataset Splits Yes CLEVRER (Yi et al., 2019): this dataset contains 20,000 synthetic videos of moving and colliding objects, separated to 10,000 train video, 5,000 validation videos and 5,000 test videos...
Hardware Specification No The paper mentions running experiments using GPUs: 'a batch size of 32 per GPU (we used 1 to 4 GPUs).' However, it does not specify any particular GPU models, CPU models, memory, or other detailed hardware specifications.
Software Dependencies No The paper mentions that the model 'is implemented in Py Torch (Paszke et al., 2017)' and that 'The GCN layes are implemented efficiently with the torch geometric library (Fey & Lenssen, 2019)'. While it names software, it does not provide specific version numbers for PyTorch or torch_geometric, which are required for reproducibility.
Experiment Setup Yes The training procedure is based on maximizing the ELBO (1). Since all of our distributions are Gaussians, the KL divergence has a closed form solution. Similar to the β-VAE (Higgins et al., 2017), we multiply the KL and Chamfer-KL terms in the loss by hyperparameters βKL and βCKL, respectively. ... The model is optimized end-to-end by Adam (Kingma & Ba, 2014), using the reparametrization trick... Appendix D. Detailed hyperparameters used for the various experiments in the paper. ... all models were trained with an initial learning rate of 2e 4 and a a batch size of 32 per GPU (we used 1 to 4 GPUs). For all datasets, we used a multi-step learning rate scheduler with the following milestones (in epochs): [30, 60] with learning rate decreasing by 0.5 on each milestone. The warm-up stage described in Appendix B was only used for the Object model, where we used 1 warm-up epoch for CLEVRER and 2 for Traffic, and the number of noisy masks epochs was 5 times the warm-up epochs 5 and 10 for CLEVRER and Traffic, respectively.