Fast Encoder-Based 3D from Casual Videos via Point Track Processing

Authors: Yoni Kasten, Wuyue Lu, Haggai Maron

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to verify our proposed network s performance on real-world casual videos. We began by training the network on specific domains and then evaluated its accuracy and running time on unseen videos from both, training and unseen domains.
Researcher Affiliation Collaboration Yoni Kasten1 Wuyue Lu2 Haggai Maron1,3 1NVIDIA Research 2Simon Fraser University 3Technion
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The NeurIPS Paper Checklist states 'No' for open access to data and code, with the justification: 'We will release the code once the paper is published.'
Open Datasets Yes Our network is trained in an unsupervised way on a dataset of extracted point track matrices [20] from raw videos without any 3D supervision by simply minimizing the reprojection errors... In our experiments, TRACKSTO4D is trained on the Common Pets dataset [44]... Common pets dataset [44] The data is available here: https://github.com/facebookresearch/cop3d. It is released under Creative Commons Attribution-Non Commercial 4.0 International Public License.
Dataset Splits No The paper mentions training on cat/dog partitions and evaluating on test data, but no explicit mention of a separate 'validation' split or its size/percentage was found for the main experiments. It states: 'We trained our networks for 7000 and 3500 epochs for the single-class and multi-class setups respectively.'
Hardware Specification Yes Training our method lasts about one week on a single Tesla V100 GPU with 32GB memory. ... All inference running times were computed on a machine with NVIDIA RTX A6000 GPU and Intel(R) Core(TM) i7-9800X 3.80GHz CPU.
Software Dependencies No The paper mentions using the 'Adam optimizer [23]' and 'Ceres package [2]' but does not provide specific version numbers for these software components or other key libraries like Python or PyTorch versions.
Experiment Setup Yes In all training setups, we used: λReprojection = 50.0, λStatic = 1.0, λNegative = 1.0, λSparse = 0.001. ... We used the Adam optimizer [23] with a learning rate of 10^-4. ... During training, at each iteration, we randomly sample 20-50 frames from the training videos and 100 point tracks, i.e. 20 N 50 and P = 100.