Learning Implicit Representation for Reconstructing Articulated Objects

Authors: Hao Zhang, Fang Li, Samyak Rawlekar, Narendra Ahuja

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate our approach with a variety of experiments. The experimental evaluation is divided into two scenarios: 3D reconstruction leveraging (1) single monocular short videos, and (2) videos spanning multiple perspectives.
Researcher Affiliation Academia Hao Zhang, Fang Li, Samyak Rawlekar & Narendra Ahuja Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign {haoz19, fangli3, samyakr2,n-ahuja}@illinois.edu
Pseudocode Yes Algorithm 1 Synergistic Iterative Optimization of Shape and Skeleton (SIOS2)
Open Source Code Yes The code is available on Git Hub at: https://github.com/haoz19/LIMR.
Open Datasets Yes Firstly, we tested our approach on the well-established BADJA Biggs et al. (2019) benchmark derived from DAVIS Perazzi et al. (2016) dataset. Additionally, we broadened our experimental scope by introducing the Planet Zoo dataset manually collected from You Tube. AMA human & Casual videos dataset. Vlasic et al. (2008) records multi-view videos by 8 synchronized cameras.
Dataset Splits No The paper mentions training, but does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment.
Hardware Specification Yes We train the model on one A100 40GB GPU and empirically find the optimizations stabilize at around the 5th training step with 1h time cost and 15 epochs for each step. For our experiments, we use one A100 40GB GPU.
Software Dependencies No The paper mentions several software components like 'Adam W', 'Segment Anything Model', 'optical flow estimators', 'Grounding DINO', 'CSL', and 'Labelme tools', but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Adam W is implemented as the optimizer with 256 image pairs in each batch and 6144 sampled pixels. The learning rates are set up by a 1-cycle learning rate scheduler starting from the lowest lrinit = 2 10 5 to the highest value lrmax = 5 10 4 and then falls to the final learning rate lrfinal = 1 10 4. For our experiments, we set the batch size to 4, and epochs are kept the same as LASR.