Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Authors: Xiangchen Song, Jiaqi Sun, Zijian Li, Yujia Zheng, Kun Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation addresses five key claims regarding our proposed method: (1) our estimation approach aligns with identifiability theory, accurately recovering latent structures; (2) existing CRL methods fail to handle high-dimensional data at scale; (3) our method is able to recover target relations between concepts from semi-synthetic data; (4) compared with common SAEs, our proposal achieves satisfactory results on quantitative evaluation metrics (SAEBench [24]); and (5) our method effectively learns both time-delayed and instantaneous causal relations among concepts elicited from LLM activations. Section 5 is dedicated to 'Experiments' and includes subsections such as 'Synthetic Data Experiments', 'Semi-synthetic Experiments', and 'Real LLM Activation Analysis', all of which describe empirical studies and evaluations. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2Mohamed bin Zayed University of Artificial Intelligence EMAIL |
| Pseudocode | No | The paper describes the estimation process in Section 4 'Implementation' using mathematical equations and refers to Figure 2 as an 'Illustration of estimation process'. However, it does not present any block explicitly labeled 'Pseudocode', 'Algorithm', or structured steps in a code-like format. |
| Open Source Code | Yes | The code that can replicate the main experiments presented in our paper can be accessed via https: //github.com/xiangchensong/temp-inst-sae |
| Open Datasets | Yes | The model is trained on 50 million tokens from the Pile dataset [17]. [17] Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The Pile: An 800GB dataset of diverse text for language modeling. ar Xiv preprint ar Xiv:2101.00027, 2020. |
| Dataset Splits | No | The paper states that the model was 'trained on 50 million samples' for synthetic data (Section 5.1.2) and 'trained on 50 million tokens from the Pile dataset [17]' (Section 5.3). It also mentions constructing 'two contrastive subsets from the Pile dataset' for semi-synthetic experiments (Section 5.2). However, it does not provide specific training, validation, or test split percentages, absolute counts, or references to predefined standard splits for any of the datasets used to reproduce experiment partitioning. |
| Hardware Specification | Yes | All experiments were conducted on a computing cluster equipped with NVIDIA L40 GPUs. The synthetic verification experiments were run using 16 CPU cores, 32 GB of memory, and a single GPU. The Jacobian complexity experiment was executed on CPU only, as the computation did not fit within GPU VRAM; to avoid out-of-memory (OOM) errors, 32 CPU cores and 400 GB of memory were allocated. The scaled-up synthetic experiment with the linear model used 32 CPU cores, 64 GB of memory, and one GPU. The large language model (LLM) activation experiment was performed using 16 CPU cores, 15 GB of memory, and a single GPU. |
| Software Dependencies | No | The paper mentions using specific models and tools like 'pythia-160m-deduped [5]', 'SAELens [6]', and 'dictionary-learning [35]'. While these are specific components, the text does not provide explicit version numbers for general software dependencies or programming libraries (e.g., Python, PyTorch, CUDA versions) which are necessary for full reproducibility. |
| Experiment Setup | Yes | We train the model for 50,000 steps with batch size 1024 (approximately 51 million total samples) using the Adam optimizer with learning rate 8 × 10−3 and weight decay 6 × 10−4. The loss function includes reconstruction error, KL divergence term, and L1 regularization penalties: 1 × 10−3 for matrix M and 1 × 10−8 for matrix B. For LLM activations: The weight of the independence constraint on the noise term is set to α = 0.1 in Eq. 9. We optimize the loss function defined in Eq. 9 using the Adam optimizer with a learning rate of 0.01 and a weight decay of 0.0001. |