Discrete Latent Perspective Learning for Segmentation and Detection

Authors: Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei Jin, Hongtao Lu, Jieping Ye

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that DLPL significantly enhances the network s capacity to depict images across diverse scenarios (daily photos, UAV, autodriving) and tasks (detection, segmentation).
Researcher Affiliation Collaboration 1University of Science and Technology of China 2Alibaba Group 3Singapore University of Technology and Design 4Dept. of CSE, MOE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University.
Pseudocode No The paper describes the architecture and modules with figures but does not provide pseudocode or algorithm blocks.
Open Source Code No We release the proposed Drone Seg dataset at https://github. com/jankyee/Drone Seg. This statement explicitly mentions the dataset, not the source code for the proposed DLPL methodology.
Open Datasets Yes Specifically, given the current lack of a large-scale UAV segmentation benchmark dataset within the field, we propose Drone Seg3, the largest-scale and semantically richest fine-grained annotated UAV segmentation dataset, to date. 3We release the proposed Drone Seg dataset at https://github. com/jankyee/Drone Seg. The paper also references multiple publicly available datasets with citations, such as Cityscapes (Cordts et al., 2016) and ADE20K (Zhou et al., 2017).
Dataset Splits Yes Table 4: Auto-Driving and Daily photos Segmentation: Comparison with state-of-the-arts on Cityscapes and ADE20K datasets... m Io U(%) Cityscapes val ADE20K val. This indicates the use of validation sets from standard benchmarks.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for experiments are mentioned in the paper.
Software Dependencies No In all experiments, we adopt the MMSegmentation (MMSegmentation, 2020) and MMDetection (Chen et al., 2019) toolboxes as codebases and follow the default basic configurations. No specific version numbers for these toolboxes or other software dependencies are provided.
Experiment Setup Yes In training, Visual Reconstruction and Perspective Space Construction accompany the entire training process. In contrast, Perspective Transformation training begins after the first two processes stabilize, at the 30% of the entire training epoch. Perspective-Invariant Learning also occurs throughout the entire training process. During the initial 30% training epoch, since P is not ready, PIA degenerates to Self-Attention learning solely based on P. [...] where α is the moving weight and we find α = 0.9 to work well in practice. [...] where λ is the loss weight and set to 0.4.