Active Visual Exploration Based on Attention-Map Entropy

Authors: Adam Pardyl, Grzegorz Rypeść, Grzegorz Kurzejamski, Bartosz Zieliński, Tomasz Trzciński

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, which also mimic retina-like sensors, we show that such simplified training significantly improves the performance of reconstruction, segmentation and classification on publicly available datasets.
Researcher Affiliation Collaboration 1IDEAS NCBR 2Jagiellonian University, Faculty of Mathematics and Computer Science 3Jagiellonian University, Doctoral School of Exact and Natural Sciences 4Warsaw University of Technology 5Tooploox 6Ardigen
Pseudocode No The paper describes the method using textual explanations and diagrams (e.g., Figure 2), but it does not provide any formally labeled pseudocode or algorithm blocks.
Open Source Code Yes Supplementary material, including the source code, can be found at: https://github.com/apardyl/AME
Open Datasets Yes The largest one, MS COCO dataset (Common Objects in Context; 2014 split version) [Lin et al., 2014], consists of 83K train images and 41K validation images. The second one, ADE20K [Zhou et al., 2017] dataset, consists of 26K training and 2k validation images, containing 150 classes for the semantic segmentation task. The smallest one is SUN360 [Song et al., 2015] dataset with approximately 8K 360 panoramic images, unevenly split between 26 classes for the multi-class classification task.
Dataset Splits Yes The largest one, MS COCO dataset... consists of 83K train images and 41K validation images. The second one, ADE20K... consists of 26K training and 2k validation images. As the last dataset does not have a predetermined train-test split, we use a 9 : 1 traintest split based on an index provided by authors of [Seifi et al., 2021].
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. It only mentions general acknowledgement of 'Polish high-performance computing infrastructure PLGrid (HPC Centers: ACK Cyfronet AGH)'.
Software Dependencies No The paper mentions software components like 'Adam W optimization algorithm' and 'transformer-based Masked Autoencoder (MAE) network' but does not specify their version numbers or other crucial software dependencies with versions for reproducibility.
Experiment Setup Yes In all our experiments, we use 24 transformer blocks in the encoder, with an embedding size of 1024... and 8 decoder blocks with an embedding size of 512. We use the Adam W optimization algorithm... with a weight decay value of 0 for reconstruction and 10 4 for other tasks. The learning rate is first linearly brought up to 10 4 for the first 10 training epochs, then decayed with the half-cycle cosine rate to 10 8 for the rest of the training. We train the model for 75 epochs with early stopping.