Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Language Model as Visual Explainer
Authors: Xingyi Yang, Xinchao Wang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To access the effectiveness of our approach, we introduce new benchmarks and conduct rigorous evaluations, demonstrating its plausibility, faithfulness, and stability. (Abstract) and 4 Experiment This section offers an in-depth exploration of our evaluation process for the proposed LVX framework... |
| Researcher Affiliation | Academia | Xingyi Yang Xinchao Wang National University of Singapore EMAIL, EMAIL |
| Pseudocode | Yes | C Pseudocode for LVX In this section, we present the pseudocode for the LVX framework, encompassing both the construction stage and the test stage. The algorithmic pipelines are outlined in Algorithm 1 and Algorithm 2. |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Code has been uploaded as supplementary material. Data will be available. (Paper Checklist Q5) |
| Open Datasets | Yes | To address this, we developed annotations for three recognized benchmarks: CIFAR10, CIFAR100 [39], and Image Net [57], termed as H-CIFAR10, H-CIFAR100, and H-Image Net. (Section 4.1) |
| Dataset Splits | Yes | The model is trained on a labeled training set Dtr = {xj, yj}M j=1, and would be evaluated a test set Dts = {xj}L j=1. (Section 2) and We report the average score across all validation samples. (Section 4.1) and To address this, we developed annotations for three recognized benchmarks: CIFAR10, CIFAR100 [39], and Image Net [57]... |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., specific GPU or CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software packages like PyTorch, torchvision, and timm but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The model is optimized with SGD for 50 epochs on the training sample, with an initial learning rate in {0.001, 0.01, 0.03} and a momentum term of 0.9. The weighting factor is set to 0.1. (Section 4.1) and In our experiment, we performed five rounds of tree refinement. (Section 3.1) |