Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
From Neurons to Neutrons: A Case Study in Interpretability
Authors: Ouail Kitouni, Niklas Nolte, Vı́ctor Samuel Pérez-Dı́az, Sokratis Trifinopoulos, Mike Williams
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In all our experiments, we will consider one or several observables to predict with various models. The performance of the models will generally be measured by a Root-Mean Square error (RMS) on a holdout set. |
| Researcher Affiliation | Collaboration | 1NSF Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) 2Massachusetts Institute of Technology 3FAIR at Meta 4Harvard John A. Paulson School of Engineering and Applied Sciences 5Center for Astrophysics | Harvard & Smithsonian 6School of Engineering, Science and Technology, Universidad del Rosario. |
| Pseudocode | No | The paper describes procedures and algorithms conceptually but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Example code is available here: https://github.com/samuelperezdi/nuclr-icml |
| Open Datasets | Yes | The data sources are: for the various energies the Atomic Mass Evaluation (AME) (Wang et al., 2021) and for the charge radii the Atomic Data and Nuclear Data Tables 99 (2013) (Angeli & Marinova, 2013). |
| Dataset Splits | Yes | We train models with different train/validation splits (10% to 90% in 10% increments, 3 random seeds each), varying batch size for consistent total optimization steps, and keeping other hyperparameters constant.with 50% of the data held out as a validation set in each setting to gauge the generalization performance. |
| Hardware Specification | Yes | Most training runs were on Nvidia V100 GPUs with some done on Nvidia A6000 GPUs. |
| Software Dependencies | No | The paper mentions using SiLU activations and AdamW optimizer, but it does not provide specific version numbers for any software libraries, programming languages, or other dependencies. |
| Experiment Setup | Yes | The runs used to generate the embeddings and visualizations have the following parameters: EPOCHS = 200,000 HIDDEN DIM = 2048 LR = 0.0001 WD = 0.01 DEPTH = 2 Seed = 0 |