Unsupervised Discovery of Parts, Structure, and Dynamics

Authors: Zhenjia Xu*, Zhijian Liu*, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.
Researcher Affiliation Collaboration Zhenjia Xu MIT CSAIL, Shanghai Jiao Tong University Zhijian Liu MIT CSAIL Chen Sun Google Research Kevin Murphy Google Research William T. Freeman MIT CSAIL, Google Research Joshua B. Tenenbaum MIT CSAIL Jiajun Wu MIT CSAIL
Pseudocode Yes Algorithm 1 Training PSD and Algorithm 2 Evaluating PSD provide structured pseudocode blocks.
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes For each dataset, we rendered totally 100,000 pairs for training and 10,000 for testing, with random visual appearance (i.e., sizes, positions, and colors). As for the digits dataset, we use six types of hand-written digits from MNIST (Le Cun et al., 1998).
Dataset Splits No The paper explicitly states 'training' and 'testing' dataset sizes but does not mention specific 'validation' splits with percentages or counts for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models or memory.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' and 'an off-the-shelf package (Liu, 2009)' for optical flow, but it does not provide specific version numbers for these or any other key software dependencies required for reproducibility.
Experiment Setup Yes Optimization is carried out using ADAM (Kingma & Ba, 2015) with β1 = 0.9 and β2 = 0.999. We use a fixed learning rate of 10 3 and mini-batch size of 32. Our motion encoder takes the flow field ˆ M between two consecutive frames as input, with resolution of 128 128. It applies seven convolutional layers with number of channels {16, 16, 32, 32, 64, 64, 64}, kernel sizes 5 5, and stride sizes 2 2.