Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DePass: Unified Feature Attributing by Simple Decomposed Forward Pass

Authors: Xiangyu Hong, Che Jiang, Kai Tian, Biqing Qi, Youbang Sun, Ning Ding, Bowen Zhou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct attribution experiments at the token level, model component level, and subspace level to evaluate the effectiveness and generality of De Pass. Our results show that De Pass enables lossless, additive decomposition of hidden states throughout the forward pass of Transformer models, allowing faithful tracking of information flow according to attribution needs. We validate De Pass across token-level, model component-level, and subspace-level attribution tasks, demonstrating its effectiveness and fidelity. Our experiments highlight its potential to attribute information flow between arbitrary components of a Transformer model.
Researcher Affiliation Academia 1 Department of Electronic Engineering, Tsinghua University 2 Shanghai AI Laboratory EMAIL EMAIL EMAIL
Pseudocode No The paper describes methods using mathematical equations and prose in sections like '3 Decomposed Forward Pass' without explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code is available at https://github.com/Tsinghua C3I/Decomposed-Forward-Pass
Open Datasets Yes We evaluate on two benchmarks targeting different reasoning types: Known_1000 [4]3 (factual QA, e.g., 'Audible.com is owned by') and IOI [19] (indirect object identification, e.g., 'Eleanor and Deanna were thinking about going to the mountain. Eleanor wanted to give a watermelon to'). We evaluate on two factuality benchmarks: Counter Fact[4] (modified for generation, prompting completions to factual statements) and Truthful QA[23] (converted to multiple-choice with misleading options).
Dataset Splits No The main experimental sections describing token-level, model component-level, and subspace-level attribution do not explicitly detail the train/test/validation splits for the models being interpreted. While Appendix C.2 on 'Truthful Probe Training and Evaluation' mentions that 'The datasets are split into training and testing sets with balanced labels to ensure fair evaluation' and 'evenly split', this information is specific to probe training and not for the primary De Pass evaluation on the main models.
Hardware Specification Yes We run all experiments on a cluster of A6000 GPUs.
Software Dependencies No All experiments are conducted in Py Torch, using pre-trained models with lightweight modifications to enable decomposition-aware forward passes. No specific version for PyTorch or other software dependencies are provided.
Experiment Setup Yes Experiment Setup. Baselines. We compare against standard attribution methods on a fixed pretrained model. Gradient-based: Input Gradient [15], Integrated Gradients [16], Gradient SHAP [17]; Attention-based: Mean Attention, Last-layer Attention [18], and Attention Rollout [6]. Tasks. We evaluate on two benchmarks targeting different reasoning types: Known_1000 [4] (factual QA, e.g., 'Audible.com is owned by') and IOI [19] (indirect object identification, e.g., 'Eleanor and Deanna were thinking about going to the mountain. Eleanor wanted to give a watermelon to'). Evaluation Protocol. For each input x, we first compute attribution scores for the correct answer via various methods. Based on these scores, we apply token-level interventions: patch top mask the top K% tokens with highest attribution; recover top mask the bottom (100 K)% tokens (lowest attribution) and then restore the top K%. The figures report the average p(K) across all data points per dataset at each masking level. In Appendix C.2, probe training details are given: 'We use the saga solver in scikit-learn, with a learning rate of 0.01 and maximum iteration count of 1000'.