Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Symmetric Machine Theory of Mind
Authors: Melanie Sclar, Graham Neubig, Yonatan Bisk
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that reinforcement learning agents that model the mental states of others achieve significant performance improvements over agents with no such theory of mind model. |
| Researcher Affiliation | Academia | 1Paul G. Allen School of Computer Science & Engineering, University of Washington 2Language Technologies Institute, Carnegie Mellon University. Correspondence to: <EMAIL, EMAIL>. |
| Pseudocode | Yes | A pseudocode of MADDPG-EE s implementation can be found in Section A.4. Algorithm 1 Actor implementation of MADDPG-EE |
| Open Source Code | Yes | Code can be found at https: //github.com/msclar/symmtom. |
| Open Datasets | No | The paper introduces a new simulated environment called 'Symm To M' for its experiments instead of using a pre-existing, publicly available dataset with specific access information. |
| Dataset Splits | No | The paper mentions training for '60000 episodes' and evaluating for '1000 episodes' but does not specify any validation dataset splits or percentages. |
| Hardware Specification | Yes | Experiments were run on a server with 256GB RAM, 2 18-core Intel E5-2699 processors @ 2.3GHz. |
| Software Dependencies | No | The paper mentions using MADDPG and its variants (RMADDPG, MADDPG-CE, etc.) as frameworks, but it does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train through 60000 episodes, and with 9 random seeds to account for high variances. Our policies are parametrized by a two-layer Re LU MLP with 64 units per layer...We used the same hyperparameters as the ones used in MADDPG, except with a reduced learning rate and tau (lr = 0.001 and τ = 0.005). We set the length of each episode to 5w |