Learning 1D Causal Visual Representation with De-focus Attention Networks

Authors: Tao Chenxin, Xizhou Zhu, Shiqian Su, Lewei Lu, Changyao Tian, Xuan Luo, Gao Huang, Hongsheng Li, Yu Qiao, Jie Zhou, Jifeng Dai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate the efficacy of our approach, demonstrating that 1D causal visual representation can perform comparably to 2D non-causal representation in tasks such as global perception, dense prediction, and multi-modal understanding.
Researcher Affiliation Collaboration Chenxin Tao1,3 , Xizhou Zhu1,2*, Shiqian Su1,3*, Lewei Lu3, Changyao Tian4, Xuan Luo1, Gao Huang1, Hongsheng Li4, Yu Qiao2, Jie Zhou1, Jifeng Dai1,2 1Tsinghua University 2Shanghai Artificial Intelligence Laboratory 3Sense Time Research 4The Chinese University of Hong Kong
Pseudocode No The paper does not include a pseudocode block or an explicitly labeled algorithm section.
Open Source Code No Code shall be released.
Open Datasets Yes Image Net-1k [13] is used, which contains 1.28M images for training and 50K images for validation.
Dataset Splits Yes Image Net-1k [13] is used, which contains 1.28M images for training and 50K images for validation.
Hardware Specification Yes These models are trained on 32 Nvidia 80G A100 GPUs for 30 hours.
Software Dependencies No The paper mentions optimizers like Adam W but does not specify software versions for libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes The Adam W optimizer [40] with a peak learning rate of 5e-4, a total batch size of 1024, a momentum of 0.9, and a weight decay of 0.05 are used. These models are trained on 32 Nvidia 80G A100 GPUs for 30 hours.