reproducibilityindex.ai

Learning by Competition of Self-Interested Reinforcement Learning Agents

Authors: Stephen Chung6384-6393

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that a network trained with Weight Maximization can learn significantly faster than REINFORCE and slightly slower than backpropagation.
Researcher Affiliation	Academia	Stephen Chung Department of Computer Science, University of Massachusetts Amherst, USA minghaychung@umass.edu
Pseudocode	Yes	The pseudo-code can be found in Algorithm 1 of Appendix B.
Open Source Code	Yes	The code is available at https://github. com/stephen-chung-mh/weight max.
Open Datasets	Yes	We applied our algorithms to four RL tasks: multiplexer, Cart Pole, Acrobot, and Lunar Lander.
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). For RL tasks, data is typically generated through environment interaction rather than static splits, and no such details are provided for reproducibility beyond the task descriptions.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, or other computing specifications used for running the experiments.
Software Dependencies	No	The paper mentions software components and algorithms (e.g., 'REINFORCE', 'backprop', 'Actor-Critic'), and implies the use of a deep learning framework and RL environments, but does not specify any software versions for libraries or programming languages (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup	Yes	All networks considered have the same architecture: a three-layer network of stochastic units, with the first hidden layer having 64 units, the second hidden layer having 32 units, and the output layer being a softmax layer. All hidden units are Bernoulli-logistic units... For the critic networks in all experiments, we used a three-layer ANN trained by backprop. Other experiments details can be found in Appendix C.