Analogy-Forming Transformers for Few-Shot 3D Parsing
Authors: Nikolaos Gkanatsios, Mayank Singh, Zhaoyuan Fang, Shubham Tulsiani, Katerina Fragkiadaki
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our model on the Part Net benchmark of Mo et al. (2019) for 3D object segmentation. We compare against state-of-the-art (SOTA) 3D object segmentors, as well as meta-learning and few-shot learning (Snell et al., 2017) baselines adapted for the task of 3D parsing. Our experiments show that Analogical Networks perform similarly to the parametric alone baselines in the standard many-shot train-test split and particularly shine over the parametric baselines in the few-shot setting. |
| Researcher Affiliation | Academia | Nikolaos Gkanatsios , Mayank Singh , Zhaoyuan Fang, Shubham Tulsiani & Katerina Fragkiadaki School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA {ngkanats,mayanks2,zhaoyuaf,stulsian,katef}@cs.cmu.edu |
| Pseudocode | Yes | In the Appendix, we include pseudo-code for within-scene training in Algorithm 1 and cross-scene training in Algorithm 2. |
| Open Source Code | Yes | Our code and models are publicly available in the project webpage: http://analogicalnets.github.io/. We have made our code publicly available on Git Hub. |
| Open Datasets | Yes | We test our model on the Part Net benchmark of Mo et al. (2019) for 3D object segmentation. Part Net contains 3D object instances from multiple object categories, annotated with parts in three different levels of granularity. |
| Dataset Splits | Yes | We split Part Net object categories into base and novel categories... For the exemplars of the base categories we consider the standard Part Net train/test splits. Our model and baselines are trained in the base category training sets, and tested on segmenting instances of the base categories in the test set. For the few-shot performance, we report mean and standard deviation over 10 tasks where we vary the K-shot support set. |
| Hardware Specification | Yes | Training takes approximately 15 and 20 minutes per epoch on a single NVIDIA A100 gpu for DETR3D and Analogical Networks respectively. |
| Software Dependencies | No | The paper mentions various algorithms and components used (e.g., Adam W optimizer, Point Net++ backbone, Hungarian matching algorithm, binary cross-entropy loss) but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch, TensorFlow, scikit-learn). |
| Experiment Setup | Yes | For both stages of training (i.e. within-scene correspondence pre-training and cross-scene training), we use Adam W optimizer (Loshchilov & Hutter, 2017) with an initial learning rate of 2e 4 and batch size of 16. We train the model for 100 epochs within-scene and 60 cross-scene. For few-shot fine-tuning/evaluation, we use Adam W optimizer with an initial learning rate of 3e 5 and batch size of 8. We fine-tune for 90 epochs and we report the performance across 10 different episodes, where each episode has a different set of K support samples. |