Learning Segmentation from Point Trajectories
Authors: Laurynas Karazija, Iro Laina, Christian Rupprecht, Andrea Vedaldi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method outperforms the prior art on motion-based segmentation, which shows the utility of long-term motion and the effectiveness of our formulation.In this section, we evaluate our approach for unsupervised motion segmentation and compare it with simple baselines and prior subspace clustering methods. Next, we compare our method with state-of-the-art methods for unsupervised video object segmentation across several datasets in a binary segmentation setting. We finish with ablation experiments of our approach. |
| Researcher Affiliation | Academia | Laurynas Karazija, Iro Laina, Christian Rupprecht, Andrea Vedaldi Visual Geometry Group University of Oxford Oxford, UK {laurynas,iro,chrisr,vedaldi}@robots.ox.ac.uk |
| Pseudocode | No | The paper does not contain a structured pseudocode or algorithm block. |
| Open Source Code | No | The code and models will be released upon acceptance. |
| Open Datasets | Yes | Datasets. We consider four primary datasets in this study. We use the synthetic MOVi-F variant of the Kubric [20] dataset... We also evaluate our approach on real-world datasets: DAVIS 2016 [53], Seg Trackv2 (STv2) [35], and FBMS [51], which are popular benchmarks for video object segmentation. |
| Dataset Splits | No | The paper states it trains on benchmark datasets (DAVIS, SegTrackv2, FBMS) and evaluates on them, implying standard splits are used, but does not explicitly state the train/validation/test split percentages or sample counts within the text. |
| Hardware Specification | Yes | We estimate about 3 hours to train a model using A6000 GPU (peak GPU memory 25GB). |
| Software Dependencies | Yes | We use Co Tracker v25. |
| Experiment Setup | Yes | We feed images at 192 352 resolution. We also use random horizontal flipping augmentation. The network is trained to predict k = 4 components... We train using Adam W optimiser, with a learning rate of 1.5e-4, weight decay of 0.01, a batch size of 8, and a linear learning warmup schedule for 1500 iterations. We train for 5000 iterations. We use an Exponential Moving Average (EMA) with the decay power of 2/3 with a warmup of 1500 iterations and update every 10 steps... We set λf = 0.03, λt = 5 10 5, and λτ = 0.1 in all experiments, which yields loss values in a similar numerical range. For the temporal smoothing loss, we use t = 5. |