CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion

Authors: Philippe Weinzaepfel, Vincent Leroy, Thomas Lucas, Romain BRÉGIER, Yohann Cabon, Vaibhav ARORA, Leonid Antsfeld, Boris Chidlovskii, Gabriela Csurka, Jerome Revaud

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our pretext task leads to significantly improved performance for monocular 3D vision downstream tasks such as depth estimation. In addition, our model can be directly applied to binocular downstream tasks like optical flow or relative camera pose estimation, for which we obtain competitive results without bells and whistles, i.e., using a generic architecture without any task-specific design.
Researcher Affiliation Industry NAVER LABS Europe https://europe.naverlabs.com/research/computer-vision/croco/
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., labeled "Pseudocode" or "Algorithm").
Open Source Code No Not with the submission but we are planning to release code if accepted.
Open Datasets Yes We train our model on a set of synthetic image pairs of 3D indoor scenes derived from the HM3D [52], Scan Net [18], Replica [58] and Replica CAD [61] datasets
Dataset Splits Yes Additionally, we report results on semantic segmentation on ADE20k [78] with 150 classes and 20,210 training images. We follow the protocol of [4] and report mean Intersection-over-Union (m Io U) on the validation set, using the same Conv Next [42] prediction head on top of the encoder.
Hardware Specification No The main paper states 'Yes, in the supplementary material.' regarding compute resources, but does not provide specific hardware details (like GPU/CPU models or memory) in the main text.
Software Dependencies No The paper mentions 'Py Torch [48]' but does not specify version numbers for any software dependencies like PyTorch, Python, or CUDA versions.
Experiment Setup Yes We implement our Cro Co model in Py Torch [48] and train the network for 400 epochs using the Adam W optimizer [43]. We use a cosine learning rate schedule with a base learning rate of 1.5 10 4 for an effective batch size of 256; with a linear warmup in the first 40 epochs.