Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

Authors: Michael Green, Matan Levy, Issar Tzachor, Dvir Samuel, Nir Darshan, Rami Ben-Ari

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate our approach through comprehensive experiments, demonstrating its superior performance against existing methods and strong baselines. The results highlight our method s ability to effectively retrieve images containing small instances of a target object embedded in cluttered images.
Researcher Affiliation	Collaboration	Michael Green1 , Matan Levy2 , Issar Tzachor1 , Dvir Samuel1,3, Nir Darshan1, Rami Ben-Ari1, 1Origin AI, Israel 2The Hebrew University of Jerusalem, Israel 3Bar-Ilan University, Israel
Pseudocode	No	The paper describes the method using textual explanations and a schematic overview in Figure 2, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	While all data used is public, we intend to release the code upon acceptance.
Open Datasets	Yes	We introduce new benchmarks specifically designed for So IR, enabling more rigorous evaluation and fostering future research in this domain. ... We evaluate Ma O using the widely adopted mean average precision (m AP) metric. For all models we use the public pre-trained weights and keep the original setting. ... The INSTRE dataset [47] ... The Per Mi R dataset [35] ... Recently, Li et al. [19] introduced Vox Det ... Additionally, we establish a new benchmark using the Vox Det dataset [19], dubbed Vox Det-So IR...
Dataset Splits	Yes	INSTRE-XS (Extra Small): Containing 2,428 queries and a gallery of 2,065 images... INSTRE-XXS (Extra-Extra Small): A more challenging subset with 106 queries and a gallery of 120 images... Per Mi R: comprises 150 queries and a gallery of 450 images... For evaluation, we converted Vox Det into a large-scale instance-based retrieval dataset, name it as Vox Det-So IR (Vox Det for Small Object Image Retrieval). The training set consists of distinct objects that do not overlap with those in the test set...
Hardware Specification	Yes	We train with a batch size of 128 for 1 epoch, across four NVIDIA-A100 nodes.
Software Dependencies	No	The paper mentions using AdamW optimizer and LoRA adapter, and applying OWLv2 for object detection, but does not provide specific version numbers for software libraries or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	Implementation Details: We use the Adam W optimizer, initializing the learning rate at 5 10 5 with an exponential decay rate of 0.93 down to 1 10 6. We fine-tune all the transformer-based models on the Vox Det training set using a Lo RA [10] adapter of rank 256. We train with a batch size of 128 for 1 epoch, across four NVIDIA-A100 nodes. For inference, we apply OWLv2 [22] as OVD, applied in object-proposal mode, to detect any object in gallery images, considering bounding boxes with a confidence threshold above 0.2. ... The refinement process is done for 80 iterations with α = 0.03 and a learning rate of 1 10 1...