MAS-SAM: Segment Any Marine Animal with Aggregated Features

Authors: Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four public MAS datasets demonstrate that our MAS-SAM can obtain better results than other typical segmentation methods. The source code is available at https://github.com/Drchip61/MAS-SAM.
Researcher Affiliation Academia Tianyu Yan1 , Zifu Wan2 , Xinhao Deng1 , Pingping Zhang1 , Yang Liu1 , Huchuan Lu1 1School of Future Technology, School of Artificial Intelligence, Dalian University of Technology 2 Robotics Institute, Carnegie Mellon University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/Drchip61/MAS-SAM.
Open Datasets Yes In this work, we adopt four public MAS benchmarks to evaluate the model performance. The MAS3K dataset [Li et al., 2020] comprises 3,103 marine images... The RMAS dataset [Fu et al., 2023] consists of 3,014 marine animal images... The UFO120 dataset [Islam et al., 2020] comprises 1,620 underwater images... The RUWI dataset [Drews-Jr et al., 2021] is a real underwater image dataset...
Dataset Splits No For the MAS3K dataset, it states: 'we use 1,769 images for training and 1,141 images for testing.' Similar splits are provided for other datasets, but no explicit validation set is mentioned for any dataset.
Hardware Specification Yes Our model is implemented with the Py Torch toolbox and one RTX 3090 GPU.
Software Dependencies No The paper mentions 'Py Torch toolbox' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The initial learning rate and weight decay are set to 0.001 and 0.1, respectively. We reduce the learning rate by a factor of 10 at every 20 epochs. The total number of training epochs is set to 50. The mini-batch size is set to 8. The input images are uniformly resized to 512 512 3.