Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates

Authors: Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that our framework achieves promising faithfulness and completeness. Additionally, to ensure consistency in the granularity of noising-based and denoising-based interventions, we introduce a misalignment score for AND and OR gates to measure whether the scales of the two intervention strategies are aligned when combined. We explore the characteristics of AND, OR, and ADDER gates in a circuit, including their proportions and contributions to the output, building upon our proposed logic gates and recovery framework. Furthermore, we examine the relationship between logic gates and the functionality of language models. Experimental results show that OR gates typically link multiple backup paths for the same function, while AND gates often connect paths for different necessary functions.
Researcher Affiliation Academia Hang Chen School of Computer Science and Technology Xi an Jiaotong University EMAIL Jiaying Zhu School of Computer Science and Engineering The Chinese University of Hong Kong EMAIL Xinyu Yang School of Computer Science and Technology Xi an Jiaotong University EMAIL Wenya Wang School of Computer Science and Engineering Nanyang Technological University EMAIL
Pseudocode Yes Algorithm 1: The ACDC algorithm in Ns.
Open Source Code Yes Answer: [Yes] Justification: We will make it open-access with github link.
Open Datasets Yes We examine the circuits obtained through Ns, Dn, and Ns+Dn. For instance, in ACDC, when intervening on each edge, we simultaneously compute the effect of substituting the clean activation with a corrupted one in the clean run, and the effect of substituting the corrupted activation with a clean one in the corrupted run. In EAP, we compute gradients under both clean and corrupted conditions. For Edge Pruning, we replace Equation 1 with Equation 3 as the optimization objective. Detailed implementation can be found in Appendix E.3. These experiments are conducted on three mainstream tasks for circuit discovery, namely indirect object inference (IOI) [16], greater than (GT) [26], and syntactic agreement [9]. The details of these tasks are presented in Table 2. Table 2: An overview of the tasks and datasets. Task Example([Corrupted text]) Output corrupted output IOI When Mary and John went to the store, John (Alice) gave a drink to Mary other names GT The war lasted from 1517 (1501) to 15 18 or 19 or... 99 other digits SA Many girls (girl) insulted themselves herself
Dataset Splits No The paper mentions established datasets for IOI [16], GT [26], and SA [9] tasks, but it does not explicitly provide the training/test/validation splits used for these datasets within the text. It only states the number of edges used for sparsity levels, not dataset splits. For example, Section 4.2.1 states: "To account for the effects of sparsity, we constrain the number of edges in both circuits to remain consistent across six sparsity levels: 100, 200, 500, 1000, 2000, and 5000 edges."
Hardware Specification No The paper states in Appendix E.1: "However, our proposed method, which employs the algorithm Ns +Dn, does not introduce additional computational complexity to the existing circuit discovery algorithm. Conceptually, it is equivalent to reapplying the Ns algorithm once more under the strategy of Dn. As a result, the overall time complexity increases by at most a factor of two relative to the baseline algorithm, without imposing any additional nonlinear burden." This discusses computational cost but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using "GPT2-small as the computational graph" in Section 3.1 and Section 4.2, but it does not specify any other software libraries, frameworks, or their version numbers (e.g., PyTorch version, Python version, specific deep learning frameworks) that would be needed to reproduce the experiments.
Experiment Setup Yes To account for the effects of sparsity, we constrain the number of edges in both circuits to remain consistent across six sparsity levels: 100, 200, 500, 1000, 2000, and 5000 edges. Figure 4 shows that, both in terms of KL divergence and accuracy, the performance of circuits removed through Ns+Dn is noticeably weaker compared to those removed through Ns. This corroborates Corollary 1, where we note that Ns, due to its inability to fully recover the OR gate, results in suboptimal completeness. This procedure is repeated 30 times for each receiver node, and the distributions of KL values are summarized via box plots, as shown in Figure 1 (Detailed results are shown in Appendix E.2). We conducted five different runs of circuit discovery using random seeds, with the random subset K chosen from subcircuits of sizes ranging from 2 to 5 nodes. For the previous evaluation metric, we performed 5, 10, 20, and 30 sampling iterations. In the combined Ns.+Dn. approach, the effects from both strategies are jointly considered. Specifically, the original pruning condition DKL(G Hnew) DKL(G H) < τ is replaced with the aggregated criterion: DKL(Gclean Hnew) DKL(Gclean H) + DKL(Gcorrupted Hnew) DKL(Gcorrupted H) < τ.