Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning

Authors: Xiaomeng Fan, Yuchuan Mao, Zhi Gao, Yuwei Wu, Jin Chen, Yunde Jia

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 11 datasets demonstrate that our method outperforms baseline approaches by up to 14%, highlighting its effectiveness and superiority. We evaluate the proposed method in open environments across two settings: base-to-base/base-to-new, and cross-dataset, using 11 image recognition datasets.
Researcher Affiliation	Academia	1Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology 2Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
Pseudocode	Yes	Algorithm 1 Distribution Alignment Algorithm Input: Parameters Φ, Data Ds, Gu, Gs, Epoch E, batch-size B Output: Optimized Prompts vmax_iter Initialize: e = 0, v v0, S {}
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We do not provide open access to the data and code at this time, but can publishpart of them at the rebuttal stage if the reviewers need it. The complete data and code willbe published after the paper is accepted.
Open Datasets	Yes	We evaluate the proposed method on 11 image recognition datasets: Image Net [10], Caltech101 (Caltech) [13], Oxford Pets (Pets) [45], Stanford Cars (Cars) [29], Flowers102 (Flowers) [43], Food101 (Food) [5], FGVCAircraft (Aircraft) [41], SUN397 (SUN) [69], UCF101 (UCF) [57], DTD [9] and Euro SAT [19].
Dataset Splits	Yes	Base-to-base/Base-to-new generalization. We equally split each dataset into base and new classes. The model is trained on base classes and evaluated on both base classes (base-to-base) and new classes (base-to-new) across all 11 datasets. Implementation details. Following the setting of Prompt SRC [27], we use a few-shot training strategy in all experiments at 16 shots which are randomly sampled for each class.
Hardware Specification	Yes	Experiments are performed on an NVIDIA A40 GPU, with at most 18 hours 20 GPU memory required to complete training across 11 datasets.
Software Dependencies	Yes	For LLMs and VLMs, we use Doubao-pro-128k to identifies the potential unseen classes, use LLaVA-v1.6-Vicuna-13B [39] to generate class-specific captions for each training class, use Llamav3.1-Instruct-8B [61] to summarize captions into class-specific domain information, and use Stable Diffusion v2.1 [50] as the text-to-image model to generate unseen-class data.
Experiment Setup	Yes	For base-to-base/base-to-new generalization, we train each model for 20 epochs using 4 token prompts in the first 9 transformer layers on both visual and text branch. For cross-dataset evaluation, we train the source model for 4 epochs using 4 prompts in the first 3 transformer layers on both visual and text branch. Prompts are randomly initialized with a normal distribution except the text prompts of the first layer which are initialized with the word embeddings of a photo of a . The SGD optimizer is adopted, and the learning rate is set as 0.0025. Hyperparameters for the class-domain-wise data generation pipeline and distribution alignment are determined empirically. Specifically, We set α = 1, β = 1, K0 as 1, K1 as 8, K2 as 3 and K3 as 1. The corresponding hyperparameters are fixed across all datasets and benchmarks.