Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MACS: Multi-Agent Reinforcement Learning for Optimization of Crystal Structures

Authors: Elena Zamaraeva, Christopher Collins, George Darling, Matthew S Dyer, Bei Peng, Rahul Savani, Dmytro Antypov, Vladimir Gusev, Judith Clymo, Paul Spirakis, Matthew Rosseinsky

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate that MACS optimizes the crystal structures significantly more efficiently than a wide range of state-of-the-art methods. These experiments cover a diverse set of crystalline materials, including compositions with different elemental species, varying numbers of species, and distinct symmetry groups. MACS exhibits strong zero-shot transferability and scalability, maintaining efficiency in the optimization of larger structures from new, unseen compositions. Our work unlocks the potential of MARL for periodic crystal structure optimization.
Researcher Affiliation Academia 1Leverhulme Research Centre for Functional Materials Design, University of Liverpool, UK 2Department of Chemistry, University of Liverpool, UK 3School of Computer Science, University of Sheffield, UK 4Department of Computer Science, University of Liverpool, UK 5The Alan Turing Institute, London, UK
Pseudocode No The paper describes the methodology using prose and mathematical equations. Figure 1 provides a diagram of the overall MACS architecture, but there are no explicit pseudocode blocks or algorithms with numbered steps formatted as code.
Open Source Code Yes Code is available at https://github.com/lrcfmd/macs.
Open Datasets No We generate training and testing structures using the Ab Initio Random Structure Searching package (AIRSS) [30]. During training, the initial pseudo random structures are generated on the fly with the condition of belonging to one of the training compositions with equal probability and having 40 atoms with a reasonable volume (see Appendix B.4 for more details). For every composition on which the policy is trained, we generate three test sets of 300 structures each, with the structures containing K, 1.5K, and 2K atoms, where K is the size of the structures used during training. The paper states in the NeurIPS Paper Checklist that 'Due to the size of the dataset, the testing sets are not provided. However, the instructions for the generation of a comparable test set are given.'
Dataset Splits Yes During training, the initial pseudo random structures are generated on the fly with the condition of belonging to one of the training compositions with equal probability and having 40 atoms with a reasonable volume (see Appendix B.4 for more details). For every composition on which the policy is trained, we generate three test sets of 300 structures each, with the structures containing K, 1.5K, and 2K atoms, where K is the size of the structures used during training.
Hardware Specification Yes We train MACS for 80000 training steps in total using 40 concurrently running environments on a Linux cluster node equipped with two 20-core Intel(R) Xeon(R) Gold 6138 CPUs (2.00 GHz) and 384 GB of memory.
Software Dependencies No We use a standard SAC architecture proposed in [21] and implemented in RLlib [29]... We utilize CHGNet [12]... All baselines are implemented in either the Atomic Simulation Environment package (ASE) [23] or Sci Py [48].
Experiment Setup Yes We use a standard SAC architecture proposed in [21] and implemented in RLlib [29], and utilize the policy network and the twin Q-networks shared between all agents for efficient training. The policy and Q-networks are two-layered MLPs with Re LU activation functions. The policy network outputs three pairs (mean, std) for the action vector, which are passed through the tanh squashing to match the action space limits. The tuples <ot ai, ut ai, ot+1 ai , Rt ai> are stored in a replay buffer with a capacity of 10 million. The hyperparameter tuning details are provided in Appendix A. Appendix A.5 includes Table 4: Hyperparameters used to train MACS, which lists: γ 0.995, Training batch size 8192, Target entropy -8, Truncate episodes TRUE, Target network update frequency 1000, Number of samples before learning starts 500, Tau 0.001, Initial alpha 1, Use twin q TRUE, Actor learning rate 0.0003, Critic learning rate 0.0003, Entropy learning rate 0.0001, Replay buffer capacity 10000000, Use prioritised replay buffer FALSE, gmax 5, cmax 0.4, Observation component-wise normalization TRUE, Number of nearest neighbors k 12, Max steps in episode 1000.