Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MESSI: A Multi-Elevation Semantic Segmentation Image Dataset of an Urban Environment

Authors: Barak Pinkovich, Boaz Matalon, Ehud Rivlin, Hector Rotstein

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents a Multi-Elevation Semantic Segmentation Image (MESSI) 1 dataset comprising 2525 images taken by a drone flying over dense urban environments. MESSI is unique in two main features. First, it contains images from various altitudes, allowing us to investigate the effect of depth on semantic segmentation... This paper describes the dataset and provides annotation details. It also explains how semantic segmentation was performed using several neural network models and shows several relevant statistics. MESSI will be published in the public domain to serve as an evaluation benchmark for semantic segmentation using images captured by a drone or similar vehicle flying over a dense urban environment.
Researcher Affiliation	Academia	Barak Pinkovich EMAIL Department of Computer Science Technion-Israel Institute of Technology
Pseudocode	No	The paper describes methodologies and experimental procedures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "The open-source semantic segmentation toolbox MMsegmentation Contributors (2020) was used to train and test various semantic segmentation models." This refers to a third-party tool used by the authors, not their own source code for the specific methodology described in the paper.
Open Datasets	Yes	1The dataset has been published at https://isl.cs.technion.ac.il/research/messi-dataset/ MESSI will be published in the public domain to serve as an evaluation benchmark for semantic segmentation using images captured by a drone or similar vehicle flying over a dense urban environment.
Dataset Splits	Yes	A dataset can be partitioned into training, validation, and test sets in many different ways. Since the trajectories of Ir Yamim and Ha-Medinah Square do not overlap with those of Agamim, they were combined to form the test set to evaluate the models out-of-distribution properties. Test images will be released as an online benchmark with undisclosed ground truth annotations. The descent scenarios of Agamim were selected as the training set... Additionally, three descend scenarios were excluded from the training set (i.e., 1, 35, and 36) due to their significant similarity with other scenarios. Paths A to C were defined as the validation set. Table. 3 summarizes how the dataset was divided.
Hardware Specification	Yes	The main experiments were performed on an RTX 3090 graphics card with 24 GB memory. Unfortunately, running inference of most medium-sized models (e.g., A.1) on a single 5472 3684 image was found unfeasible, let alone training with such large images... Finally, Mask2Former Cheng et al. (2022)... Its training hyper-parameters were set to the same values as all other models here, e.g. each mini-batch (of size two) is a random crop of 1024 by 1024.
Software Dependencies	Yes	The open-source semantic segmentation toolbox MMsegmentation Contributors (2020) was used to train and test various semantic segmentation models.
Experiment Setup	Yes	One method of dealing with imbalanced datasets is to increase samples weight from rare categories... Three different schemes were tested: 1) uniform weight for all classes (referenced as Equal ), 2) a weight inversely proportional to the representation of each class in the training set ( Prop ), and 3) a weight inversely proportional to the square-root of the representation of each class in the training set ( Sqrt )... Image augmentations during training are another standard way to facilitate generalization... random resizing by up to 15% was employed during training. In addition, a photometric distortion augmentation was performed... Finally, random horizontal flipping was also employed... during training... each image in the mini-batch (of size two) is a random crop of 1024 by 1024. The number of training epochs for all models was set at 320, the maximal default value suggested by MMsegmentation. Other than those already mentioned, all hyper-parameters were set to the default values used by MMsegmentation on Cityscapes.