Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline
Authors: Simar Kareer, Vivek Vijaykumar, Harsh Maheshwari, Judy Hoffman, Prithvijit Chattopadhyay, Viraj Uday Prabhu
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Concretely, we benchmark the performance of HRDA (Hoyer et al., 2022b) and HRDA+MIC (Hoyer et al., 2022c), state-of-the-art methods for Image-DAS, on Video-DAS benchmarks. We find that even after carefully controlling for model architecture and training data, HRDA+MIC outperforms state-of-the-art Video DAS methods, e.g. by 14.5 m Io U on Viper Cityscapes-Seq, and 19.0 m Io U on Synthia-Seq Cityscapes-Seq (Figure 1), the two established benchmarks for this task. We perform an ablation study to identify the source of this improvement and find multi-resolution fusion (Hoyer et al., 2022b) to be the most significant factor. |
| Researcher Affiliation | Academia | Simar Kareer, Vivek Vijaykumar, Harsh Maheshwari, Prithvijit Chattopadhyay, Judy Hoffman, Viraj Prabhu Georgia Institute of Technology Correspondence: EMAIL |
| Pseudocode | No | The paper describes methodologies using natural language and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks with structured steps formatted like code. |
| Open Source Code | Yes | To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image DAS methods on a common benchmark. Code available at this link. ... We open-source our codebase Unified Video DA, built on top of MMSegmentation, a commonly used library for Image-DAS. |
| Open Datasets | Yes | Viper (Richter et al., 2016) to Cityscapes-Seq (Cordts et al., 2016) and Synthia Seq (Ros et al., 2016) to Cityscapes-Seq. ... BDD10k (Yu et al., 2020), which to our knowledge has not been studied previously in the context of Video-DAS. |
| Dataset Splits | Yes | Viper contains 13367 training clips and 4959 validation clips, each of length 10. Cityscapes-Seq contains 2975 training clips and 500 validation clips, each of length 30. ...Synthia-Seq is a considerably smaller synthetic dataset, with a total of 850 sequential frames. To mitigate the small size of the dataset, we train on every frame, consistent with prior work. ... Next, we split these 3,429 images into 2,999 train samples and 430 evaluation samples. |
| Hardware Specification | No | The paper mentions 'manageable memory footprint' and 'GPU footprint' in relation to methods like MRFusion, but it does not provide specific details about the CPU, GPU models, memory, or other hardware used for the experiments. |
| Software Dependencies | No | To run these experiments, we make a number of contributions building off of the MMSegmentation (MMSeg) codebase (Contributors, 2020). ... The optical flows are generated by Flowformer (Huang et al., 2022)... Adam W optimizer (Loshchilov & Hutter, 2019)... The paper mentions software by name and provides citations, but it does not specify exact version numbers for these software dependencies (e.g., MMSegmentation vX.Y.Z). |
| Experiment Setup | Yes | After running a learning rate sweep we find the parameters used in HRDA+MIC (Hoyer et al., 2022c) to work best and so retain the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate of 6e-5 (encoder), 6e-4 (decoder), batch size of 2, 40k iterations of training, and a linear decay schedule with a warmup of 1500 iterations. For all experiments, we fix the random seed to 1. |