Test-time Training for Matching-based Video Object Segmentation

Authors: Juliette Bertrand, Giorgos Kordopatis Zilos, Yannis Kalantidis, Giorgos Tolias

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on common benchmarks demonstrate that the proposed test-time training yields significant improvements in performance. Our results illustrate that test-time training enhances performance even in these challenging cases.
Researcher Affiliation Collaboration Juliette Bertrand 1,2 Giorgos Kordopatis-Zilos 1 Yannis Kalantidis2 Giorgos Tolias1 1VRG, FEE, Czech Technical University in Prague 2NAVER LABS Europe
Pseudocode No The paper does not include a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Project page: https://jbertrand89.github.io/test-time-training-vos/
Open Datasets Yes DAVIS-2017 validation set [38] and the You Tube VOS-2018 validation set [51]. We further report results on the validation set of the recent MOSE [10] dataset... Additionally, we introduce DAVIS-C... Image Net-C [14]... BL-30K [8]
Dataset Splits Yes We report results on the two most commonly used benchmarks for video object segmentation, the DAVIS-2017 validation set [38] and the You Tube VOS-2018 validation set [51]. The validation split of the DAVIS-2017 [38] dataset contains 30 videos... The validation split of the You Tube VOS-2018 [51] dataset contains 474 high-quality videos...
Hardware Specification Yes which typically requires approximately 12.5 hours on 2 A100 GPUs to train STCN.
Software Dependencies No The paper mentions the use of 'Adam [23] optimizer' and builds on 'STCN2 and XMem3 implementations', but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We use learning rates 10 5 and 10 6 for models STCN-BL30K/ XMem -BL30K and STCN-DY/ XMem -DY, respectively since their training data differ significantly. Jump step s for sampling training frames is set to 10. For each test example, we train the models with tt-MCC and tt-Ent for 100 iterations and with tt-AE for 20, using the Adam [23] optimizer and a batch size of 4 sequences for STCN and 1 for XMem.