Image to Sphere: Learning Equivariant Features for Efficient Pose Prediction

Authors: David Klee, Ondrej Biza, Robert Platt, Robin Walters

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS
Researcher Affiliation Academia Northeastern University
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at https://dmklee.github.io/image2sphere.
Open Datasets Yes The first dataset, Model Net10-SO(3) (Liao et al., 2019), is composed of rendered images of synthetic, untextured objects from Model Net10 (Wu et al., 2015). ... PASCAL3D+, (Xiang et al., 2014), is a popular benchmark for pose estimation ... Lastly, the SYMSOL dataset was recently introduced by Murphy et al. (2021)...
Dataset Splits Yes The dataset has a standardized train and test split. It provides two training sets: one with 100 views per object instance (we call this the Full Training Set), and one with 20 views per object instance (we call this the Limited Training Set). The test set has 4 views per instance. ... For each shape, there are 50k renderings in the training set and 5K renderings in the test set. ... The training data is found in the Image Net train, Image Net val, and PASCALVOC train folders, and the test data is in PASCALVOC val.
Hardware Specification No The paper mentions "I2S is instantiated with PyTorch and the e3nn library" but does not specify any hardware details like GPU/CPU models, processors, or memory used for running the experiments.
Software Dependencies No The paper mentions "PyTorch" and the "e3nn library" but does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes I2S uses a residual network (He et al., 2016) with weights pretrained on Image Net (Deng et al., 2009) to extract dense feature maps from 2D images. We use a Res Net50 backbone for Model Net10-SO(3) and SYMSOL, and Res Net101 for PASCAL3D+. The orthographic projection uses a HEALPix grid with recursion level of 2, out of which 20 points are randomly selected during each forward pass. ... It is trained using SGD with Nesterov momentum of 0.9 for 40 epochs using a batch size of 64. The learning rate starts at 0.001 and decays by factor of 0.1 every 15 epochs.