Fast Encoder-Based 3D from Casual Videos via Point Track Processing
Authors: Yoni Kasten, Wuyue Lu, Haggai Maron
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to verify our proposed network s performance on real-world casual videos. We began by training the network on specific domains and then evaluated its accuracy and running time on unseen videos from both, training and unseen domains. |
| Researcher Affiliation | Collaboration | Yoni Kasten1 Wuyue Lu2 Haggai Maron1,3 1NVIDIA Research 2Simon Fraser University 3Technion |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The NeurIPS Paper Checklist states 'No' for open access to data and code, with the justification: 'We will release the code once the paper is published.' |
| Open Datasets | Yes | Our network is trained in an unsupervised way on a dataset of extracted point track matrices [20] from raw videos without any 3D supervision by simply minimizing the reprojection errors... In our experiments, TRACKSTO4D is trained on the Common Pets dataset [44]... Common pets dataset [44] The data is available here: https://github.com/facebookresearch/cop3d. It is released under Creative Commons Attribution-Non Commercial 4.0 International Public License. |
| Dataset Splits | No | The paper mentions training on cat/dog partitions and evaluating on test data, but no explicit mention of a separate 'validation' split or its size/percentage was found for the main experiments. It states: 'We trained our networks for 7000 and 3500 epochs for the single-class and multi-class setups respectively.' |
| Hardware Specification | Yes | Training our method lasts about one week on a single Tesla V100 GPU with 32GB memory. ... All inference running times were computed on a machine with NVIDIA RTX A6000 GPU and Intel(R) Core(TM) i7-9800X 3.80GHz CPU. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer [23]' and 'Ceres package [2]' but does not provide specific version numbers for these software components or other key libraries like Python or PyTorch versions. |
| Experiment Setup | Yes | In all training setups, we used: λReprojection = 50.0, λStatic = 1.0, λNegative = 1.0, λSparse = 0.001. ... We used the Adam optimizer [23] with a learning rate of 10^-4. ... During training, at each iteration, we randomly sample 20-50 frames from the training videos and 100 point tracks, i.e. 20 N 50 and P = 100. |