Analogy-Forming Transformers for Few-Shot 3D Parsing

Authors: Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our model on the Part Net benchmark of Mo et al. (2019) for 3D object segmentation. We compare against state-of-the-art (SOTA) 3D object segmentors, as well as meta-learning and few-shot learning (Snell et al., 2017) baselines adapted for the task of 3D parsing. Our experiments show that Analogical Networks perform similarly to the parametric alone baselines in the standard many-shot train-test split and particularly shine over the parametric baselines in the few-shot setting.
Researcher Affiliation Academia Nikolaos Gkanatsios , Mayank Singh , Zhaoyuan Fang, Shubham Tulsiani & Katerina Fragkiadaki School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA {ngkanats,mayanks2,zhaoyuaf,stulsian,katef}@cs.cmu.edu
Pseudocode Yes In the Appendix, we include pseudo-code for within-scene training in Algorithm 1 and cross-scene training in Algorithm 2.
Open Source Code Yes Our code and models are publicly available in the project webpage: http://analogicalnets.github.io/. We have made our code publicly available on Git Hub.
Open Datasets Yes We test our model on the Part Net benchmark of Mo et al. (2019) for 3D object segmentation. Part Net contains 3D object instances from multiple object categories, annotated with parts in three different levels of granularity.
Dataset Splits Yes We split Part Net object categories into base and novel categories... For the exemplars of the base categories we consider the standard Part Net train/test splits. Our model and baselines are trained in the base category training sets, and tested on segmenting instances of the base categories in the test set. For the few-shot performance, we report mean and standard deviation over 10 tasks where we vary the K-shot support set.
Hardware Specification Yes Training takes approximately 15 and 20 minutes per epoch on a single NVIDIA A100 gpu for DETR3D and Analogical Networks respectively.
Software Dependencies No The paper mentions various algorithms and components used (e.g., Adam W optimizer, Point Net++ backbone, Hungarian matching algorithm, binary cross-entropy loss) but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn).
Experiment Setup Yes For both stages of training (i.e. within-scene correspondence pre-training and cross-scene training), we use Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate of 2e 4 and batch size of 16. We train the model for 100 epochs within-scene and 60 cross-scene. For few-shot fine-tuning/evaluation, we use Adam W optimizer with an initial learning rate of 3e 5 and batch size of 8. We fine-tune for 90 epochs and we report the performance across 10 different episodes, where each episode has a different set of K support samples.