Win-Win: Training High-Resolution Vision Transformers from Two Windows

Authors: Vincent Leroy, Jerome Revaud, Thomas Lucas, Philippe Weinzaepfel

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we first validate our Win-Win training strategy on a monocular task (semantic segmentation) in Section 4.1 and then present results for the binocular task of optical flow (Section 4.2). Please refer to Appendix D for more results on the monocular depth estimation task.
Researcher Affiliation Industry Vincent Leroy, Jerome Revaud, Thomas Lucas & Philippe Weinzaepfel Naver Labs Europe firstname.lastname@naverlabs.com
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that open-source code for the methodology is provided, nor does it include a link to a repository.
Open Datasets Yes Experiments are performed on the BDD-100k dataset (Yu et al., 2020) that comprise 7,000 training images and 1,000 validation images in a driving scenario with 19 semantic classes. All images have a relatively high resolution of 1280 720 pixels.
Dataset Splits Yes Experiments are performed on the BDD-100k dataset (Yu et al., 2020) that comprise 7,000 training images and 1,000 validation images in a driving scenario with 19 semantic classes. Models are trained on Flying Chairs (Dosovitskiy et al., 2015), Flying Things (Mayer et al., 2016), and MPI-Sintel from which we keep two sequences apart for validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions optimizers and their parameters but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We use the Adam W (Loshchilov & Hutter, 2019) optimizer, with betas of 0.9 and 0.999, a cosine learning rate schedule with a base learning rate of 0.0001, with two warmup epochs, a weight decay of 0.05 and a learning rate layer decay of 0.75. We train our models for 200 epochs on the 7,000 training images from the BDD10k dataset...