Associating Objects with Transformers for Video Object Segmentation

Authors: Zongxin Yang, Yunchao Wei, Yi Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on both multi-object and single-object benchmarks to examine AOT variant networks with different complexities.
Researcher Affiliation Collaboration Zongxin Yang1,2, Yunchao Wei3,4, Yi Yang1 1 CCAI, College of Computer Science and Technology, Zhejiang University 2 Baidu Research 3 Institute of Information Science, Beijing Jiaotong University 4 Beijing Key Laboratory of Advanced Information Science and Network
Pseudocode No The paper describes the proposed methods and their components (e.g., LSTT block structure in Fig. 2c) but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a direct link to open-source code or an explicit statement about its release.
Open Datasets Yes We evaluate AOT on popular multi-object benchmarks, You Tube-VOS [48] and DAVIS 2017 [31], and single-object benchmark, DAVIS 2016 [30].
Dataset Splits Yes You Tube-VOS contains 3471 videos in the training split with 65 categories and 474/507 videos in the validation 2018/2019 split with additional 26 unseen categories.
Hardware Specification No The paper discusses performance metrics like FPS but does not provide specific details on the hardware (e.g., GPU models, CPU types) used for experiments.
Software Dependencies No AOT performs well with Paddle Paddle [1] and Py Torch [28]. (No version numbers provided for reproducibility).
Experiment Setup Yes The spatial neighborhood size λ is set to 15, and the number of identification vectors, M, is set to 10, which is consistent with the maximum object number in the benchmarks [48, 31]. The hyper-parameters of these variants are: (1) AOT-Tiny: L = 1, m = {1}; (2) AOT-Small: L = 2, m = {1}; (3) AOT-Base: L = 3, m = {1}; (4) AOT-Large: L = 3, m = {1, 1 + δ, 1 + 2δ, 1 + 3δ, ...}.