Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Forte : Finding Outliers with Representation Typicality Estimation

Authors: Debargha Ganguly, Warren Morningstar, Andrew Yu, Vipin Chaudhary

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrating Forte s superior performance compared to state-of-the art supervised and unsupervised baselines on various OOD detection tasks, and synthetic image detection, including photorealistic images generated by advanced techniques like Stable Diffusion.
Researcher Affiliation	Collaboration	Debargha Ganguly 1, Warren Morningstar2, Andrew Yu1, Vipin Chaudhary1 1Case Western Reserve University, Cleveland, OH, USA 2Google Research, Mountain View, CA, USA
Pseudocode	Yes	Algorithm 1 OOD Detection Using Per-Point PRDC Metrics in Forte
Open Source Code	Yes	Our code is available at github.com/Debargha G/forte.
Open Datasets	Yes	Using public datasets like Fast MRI Zbontar et al. (2018); Knoll et al. (2020) and the Osteoarthritis Initiative (OAI) Nevitt et al. (2006), we simulate realistic scenarios where models trained on one dataset (treated as in-distribution) are confronted with another (considered OOD).
Dataset Splits	Yes	We split the data into three parts: one-third for held-out testing, one-third as the reference distribution, and one-third as a test distribution that is drawn from the reference distribution.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory amounts, or cloud instance types) are provided in the paper.
Software Dependencies	No	The image generation pipeline is implemented using the Hugging Face Transformers (Wolf et al., 2020) and Diffusers (von Platen et al., 2022) libraries, which provide high-level APIs for working with pre-trained models. While libraries are mentioned, no specific version numbers are provided for reproducibility.
Experiment Setup	Yes	We train One-Class SVM (Schölkopf et al., 2001), Gaussian Kernel Density Estimation (Parzen, 1962), and Gaussian Mixture Model (Reynolds et al., 2009) on the reference summary statistics. For reliable measurements, Forte is run with 10 random seeds. We use the Stable Diffusion 2.0 base model and generate images with varying strength parameters (0.3, 0.5, 0.7, 0.9, 1.0) to control the influence of the input image on the generated output