Causality Compensated Attention for Contextual Biased Visual Recognition

Authors: Ruyang Liu, Jingjia Huang, Thomas H. Li, Ge Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show our model obtains significant improvements in classification and detection with lower computation.
Researcher Affiliation Collaboration Ruyang Liu 1 Jingjia Huang 2 Thomas H. Li1 Ge Li B1 1School of Electronic and Computer Engineering, Peking University 2Byte Dance Inc {ruyang@stu,geli@ece,thomas@}.pku.edu.cn huangjingjia@bytedance.com
Pseudocode No The paper describes computational steps and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code No All our codes were implemented in Pytorch (Paszke et al., 2017). This statement indicates the implementation framework but does not explicitly state that the code for the described methodology is publicly available, nor does it provide a link.
Open Datasets Yes We conduct extensive experiments to evaluate the superiority of our method. Rich quantitative and ablation results show that our method can bring about significant improvements on datasets MS-COCO (Lin et al., 2014) and PASCAL-VOC (Everingham et al., 2010) for various computer vision tasks with both CNN-based and transformer-based backbones.
Dataset Splits Yes Multi-label classification on MS-COCO. MS-COCO (Lin et al., 2014) is the most popular benchmark for object detection, segmentation and caption, and has also become widely used in multi-label recognition recently.
Hardware Specification No The paper does not provide specific details regarding the hardware used for experiments, such as GPU/CPU models or cloud computing specifications.
Software Dependencies No The paper states 'All our codes were implemented in Pytorch (Paszke et al., 2017)' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Unless otherwise stated, we use Res Net101 (He et al., 2016) pre-trained on Image Net 1k (Deng et al., 2009) as our backbone. For the multiple-sampling module, we adopt channel-shuffle with start point = 0, intervals = 1 & 2, and sampling dimension = 512 (1/4 of the full dimension), and thus we have N = 8. γ is set as 1/32. For the heavy version of IDA, we only have 2 layers of transformer and do not implement the multi-head operation on dot-product attention. There is no extra data preprocessing besides the standard data augmentation (Ridnik et al., 2021; Chen et al., 2019c). The multi-label classification model and the detection model are both optimized by the Binary Cross Entropy Loss with sigmoid. We choose Adam as our optimizer with weight decay of 1e 4 and (β1, β2) = (0.9, 0.9999). The learning rate is 1e 4 for the batch size of 128 with a 1-cycle policy.