Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection

Authors: Kemal Oksuz, Selim Kuzucu, Tom Joy, Puneet K. Dokania

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We term this approach the Mixture of Calibrated Experts (Mo Ca E) and demonstrate its effectiveness through extensive experiments on 5 different detection tasks, showing that it: (i) improves object detectors on COCO and instance segmentation methods on LVIS by up to 2.5 AP; (ii) reaches state-of-the-art on COCO test-dev with 65.1 AP and on DOTA with 82.62 AP50; (iii) outperforms single models consistently on recent detection tasks such as Open Vocabulary Object Detection.
Researcher Affiliation	Industry	Kemal Oksuz EMAIL Five AI Ltd., United Kingdom Selim Kuzucu EMAIL Five AI Ltd., United Kingdom Tom Joy EMAIL Five AI Ltd., United Kingdom Puneet K. Dokania EMAIL Five AI Ltd., United Kingdom
Pseudocode	No	The paper describes methods like NMS, Soft NMS, and Score Voting in detail, including mathematical formulations and discussions of their mechanisms. However, it does not present these or any other procedures in a structured pseudocode or algorithm block format.
Open Source Code	Yes	Code is available at: https://github.com/fiveai/MoCaE.
Open Datasets	Yes	To demonstrate the effect of calibration, we use the common COCO dataset (Lin et al., 2014). Specifically, we evaluate on four different tasks: object detection (COCO (Lin et al., 2014)), rotated object detection (DOTA (Xia et al., 2018)), open vocabulary object detection (COCO and ODin W35 (Li et al., 2022a)) and instance segmentation (LVIS (Gupta et al., 2019)). Secondly, we evaluate the performance of Mo Ca E on Objects45K (Oksuz et al., 2023).
Dataset Splits	Yes	Similar to (Kuppers et al., 2022), we randomly split COCO val set with 5K images into two, and use 2.5K images for testing as COCO minitest and the remaining 2.5K images as COCO minival to analyse calibration. Specifically, we find it sufficient to use 500 images for calibrating the detectors. Similar to COCO, we reserve 500 images from val set to calibrate the detectors, and test our models on the remaining 19.5K images of the val set.
Hardware Specification	Yes	NMS time is measured in terms of ms using a single Nvidia 1080ti GPU.
Software Dependencies	No	The paper mentions using 'mmdetection (Chen et al., 2019)' as a source for some models and official repositories for others. However, it does not specify version numbers for MMDetection or any other software libraries (e.g., Python, PyTorch versions) used in their own implementation.
Experiment Setup	Yes	All these detectors are trained for 36 epochs using multi-scale training data augmentation in which the shorter side of the image is resized within the range of [480, 800] for RS R-CNN and ATSS and [640, 800] for PAA. In order to focus only on calibration, here we use the standard NMS with an Io U threshold of 0.65 as in Wang et al. (2022a). Following the literature, we use NMS with an Io U threshold of 0.35 as Soft NMS and Score Voting are not straightforward to use in this task. Specifically, the refined box ˆbi is obtained by considering B as the set of detections from all experts before Soft NMS (i.e., including the ones removed by Soft NMS): P j B ˆpj ˆ Io U jˆbj P j B ˆpj ˆ Io U j , where ˆ Io U j = e (1 Io U(ˆbi,ˆbj ))2 where σSV is the hyper-parameter, which we set to 0.04.