Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning by Competition of Self-Interested Reinforcement Learning Agents
Authors: Stephen Chung6384-6393
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that a network trained with Weight Maximization can learn significantly faster than REINFORCE and slightly slower than backpropagation. |
| Researcher Affiliation | Academia | Stephen Chung Department of Computer Science, University of Massachusetts Amherst, USA EMAIL |
| Pseudocode | Yes | The pseudo-code can be found in Algorithm 1 of Appendix B. |
| Open Source Code | Yes | The code is available at https://github. com/stephen-chung-mh/weight max. |
| Open Datasets | Yes | We applied our algorithms to four RL tasks: multiplexer, Cart Pole, Acrobot, and Lunar Lander. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). For RL tasks, data is typically generated through environment interaction rather than static splits, and no such details are provided for reproducibility beyond the task descriptions. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, or other computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and algorithms (e.g., 'REINFORCE', 'backprop', 'Actor-Critic'), and implies the use of a deep learning framework and RL environments, but does not specify any software versions for libraries or programming languages (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | All networks considered have the same architecture: a three-layer network of stochastic units, with the first hidden layer having 64 units, the second hidden layer having 32 units, and the output layer being a softmax layer. All hidden units are Bernoulli-logistic units... For the critic networks in all experiments, we used a three-layer ANN trained by backprop. Other experiments details can be found in Appendix C. |