Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DINO-Foresight: Looking into the Future with DINO

Authors: Efstathios Karypidis, Ioannis Kakogeorgiou, Spyridon Gidaris, Nikos Komodakis

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show the very strong performance, robustness and scalability of our framework. Experimental results demonstrate a unique advantage of our approach our single model successfully handles multiple future-frame understanding tasks (semantic segmentation, instance segmentation, depth prediction, and surface normal prediction) where previous approaches required multiple specialized models.
Researcher Affiliation Collaboration 1Archimedes, Athena Research Center, Greece 2valeo.ai 3National Technical University of Athens 4University of Crete 5IACM-Forth
Pseudocode No The paper describes the methodology in prose, without explicit pseudocode or algorithm blocks.
Open Source Code Yes Project page and code at https://dino-foresight.github.io/
Open Datasets Yes Data. We assess our approach using the Cityscapes (Cordts et al., 2016) and nu Scenes (Caesar et al., 2020) datasets, both offering video sequences from urban driving environments.
Dataset Splits Yes The Cityscapes dataset includes 2,975 training sequences, 500 for validation, each with 30 frames... The nu Scenes dataset comprises of 700 training scenes and 150 validation scenes... For short-term prediction, the model uses frames 8, 11, 14, and 17 as context to predict frame 20 (with context length Nc = 4 and Np = 1).
Hardware Specification Yes Training is conducted on 8 A100 40Gb GPUs with an effective batch size of 64.
Software Dependencies No The paper mentions software like DINOv2, DPT, Mask2Former, Detectron2, and Adam optimizer but does not specify their version numbers or the Python/PyTorch versions used.
Experiment Setup Yes We use 12 layers with a hidden dimension of d = 1152 and sequence length N = 5 (with Nc = 4 context frames and Np = 1 future frame). For end-to-end training, we use the Adam optimizer (Kingma and Ba, 2015) with momentum parameters β1 = 0.9, β2 = 0.99, and a learning rate of 6.4 10 4 with cosine annealing. Training is conducted on 8 A100 40Gb GPUs with an effective batch size of 64.