Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

Authors: Moru Liu, Hao Dong, Jessica Kelly, Olga Fink, Mario Trapp

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluations across eight datasets and four modalities to validate the effectiveness of Feature Mixing. For multimodal OOD detection, we use five datasets from the Multi OOD benchmark [17] with video and optical flow modalities. For multimodal OOD segmentation, we evaluate on large-scale realworld datasets, including Semantic KITTI [3] and nu Scenes [6], with image and point cloud modalities. To address the lack of multimodal OOD segmentation datasets, we introduce CARLA-OOD, a synthetic dataset generated using CARLA simulator [18]... Extensive experiments on Semantic KITTI, nu Scenes, CARLA-OOD datasets, and the Multi OOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a 10 to 370 speedup.
Researcher Affiliation	Academia	Moru Liu1 Hao Dong2 Jessica Kelly3 Olga Fink4 Mario Trapp1,3 1Technical University of Munich 2ETH Zürich 3Fraunhofer IKS 4EPFL
Pseudocode	Yes	Algorithm 1 Feature Mixing Input: ID feature F = [Fc; Fl], where Fc is from modality 1 with Nc channels, Fl is from modality 2 with Nl channels; number of selected feature dimensions for mixing N. Python-like Code: selectc = random.sample(range(Nc), N) selectl = random.sample(range(Nl), N) e Fc = Fc.clone() e Fl = Fl.clone() e Fc[selectc, :, :] = Fl[selectl, :, :] e Fl[selectl, :, :] = Fc[selectc, :, :] Fo = torch.cat([e Fc, e Fl], dim = 0) Output: Multimodal outlier feature Fo.
Open Source Code	Yes	Our source code and dataset will be available at https://github.com/mona4399/Feature Mixing.
Open Datasets	Yes	Additionally, we introduce CARLA-OOD, a novel multimodal dataset for OOD segmentation, featuring synthetic OOD objects across diverse scenes and weather conditions. [...] Our source code and dataset will be available at https://github.com/mona4399/Feature Mixing. Extensive experiments on Semantic KITTI [3], nu Scenes [6], CARLA-OOD datasets, and the Multi OOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a 10 to 370 speedup.
Dataset Splits	Yes	The nu Scenes dataset contains 28, 130 training frames and 6, 019 validation frames... Semantic KITTI consists of 21, 000 frames from sequences 00-10 for training and validation, annotated with 19 semantic classes. Following [53, 7], we use sequence 08 for validation. The remaining sequences (00-07 and 09-10) are used for training. [...] The KITTI-CARLA dataset consists of 7 sequences, each containing 5, 000 frames captured from distinct CARLA maps and annotated with 22 classes for the Li DAR point cloud. We select 1,000 evenly sampled frames from each sequence, resulting in a total of 7, 000 frames for training and validation. The dataset is split into a training set (Town01, Town03 Town07) and a validation set (Town02), with testing performed on our CARLA-OOD dataset.
Hardware Specification	No	The paper mentions camera and Li DAR streams and the use of backbones like ResNet-34 and Salsa Next, but it does not specify any particular hardware like GPU or CPU models, or memory amounts used for running the experiments. It implicitly refers to computational resources by mentioning training, but lacks specific details.
Software Dependencies	No	The paper mentions software components like SGD with Nesterov, Adam optimizer, ResNet-34, Salsa Next, and Slow Fast network but does not provide specific version numbers for these libraries or frameworks, which is required for reproducibility.
Experiment Setup	Yes	The networks are trained for 50 epochs with a batch size of 4, starting with a learning rate of 0.0005 and with a cosine schedule. To prevent overfitting, we apply various data augmentation techniques, including random horizontal flipping, random scaling, color jitter, 2D random rotation, and random cropping. For hyperparameters, we set N in Feature Mixing to 10 and γ1 in loss to 3.0. For A2D, we set γ2 to 1.0. For x MUDA, we set γ2 to 0.5. For the Multimodal OOD Detection task... The models are pre-trained on each dataset s training set using standard cross-entropy loss. The Adam optimizer [29] is employed with a learning rate of 0.0001 and a batch size of 16. For hyperparameters, we set N in Feature Mixing to 512.