Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FACE: Faithful Automatic Concept Extraction
Authors: Dipkamal Bhusal, Michael Clifford, Sara Rampazzi, Nidhi Rastogi
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Systematic evaluations on Image Net, COCO, and Celeb A datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics. |
| Researcher Affiliation | Collaboration | Dipkamal Bhusal Rochester Institute of Technology Rochester, NY EMAIL Michael Clifford Toyota Info Tech Labs Mountain View, CA EMAIL Sara Rampazzi University of Florida Gainesville, FL EMAIL Nidhi Rastogi Rochester Institute of Technology Rochester, NY EMAIL |
| Pseudocode | No | The paper describes methods and optimization steps but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/dipkamal/FACE. |
| Open Datasets | Yes | Systematic evaluations on Image Net, COCO, and Celeb A datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics. Datasets and Models. We evaluate FACE on three datasets of varying semantic granularity: Image Net [7], COCO [19], and Celeb A [20]. We use Res Net-34 [14] and Mobile Net V2 [28] as target models for explanation. |
| Dataset Splits | Yes | All results are averaged over correctly-classified 10,000 samples from 10 different Image Net classes, 5,000 samples from 5 COCO classes, and 4,000 samples from the 4 selected Celeb A attributes. |
| Hardware Specification | Yes | We measured wall-clock time and peak VRAM on a single NVIDIA TITAN Xp (12 GB VRAM, CUDA 12.2) using Res Net-34, rank r = 25, and 1500 Image Net images (classwise run). |
| Software Dependencies | No | The paper mentions 'NVIDIA TITAN Xp (12 GB VRAM, CUDA 12.2)' but does not list specific software dependencies like programming languages or libraries with version numbers, e.g., 'Python 3.8, PyTorch 1.9'. |
| Experiment Setup | Yes | We optimize this using Adam with a learning rate of 5 4 and early stopping when the absolute change in total loss drops below below = 10 3. Non-negativity is enforced on U and W after each gradient update via in-place clamping. We sweep over {10 25, . . . , 1020} to select the best regularization value per dataset. We use matrix decomposition rank as 25 for experiments but provide ablation study on varying the decomposition rank hyperparameter in Section 4.4. |