Learning by Competition of Self-Interested Reinforcement Learning Agents

Authors: Stephen Chung6384-6393

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that a network trained with Weight Maximization can learn significantly faster than REINFORCE and slightly slower than backpropagation.
Researcher Affiliation Academia Stephen Chung Department of Computer Science, University of Massachusetts Amherst, USA minghaychung@umass.edu
Pseudocode Yes The pseudo-code can be found in Algorithm 1 of Appendix B.
Open Source Code Yes The code is available at https://github. com/stephen-chung-mh/weight max.
Open Datasets Yes We applied our algorithms to four RL tasks: multiplexer, Cart Pole, Acrobot, and Lunar Lander.
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). For RL tasks, data is typically generated through environment interaction rather than static splits, and no such details are provided for reproducibility beyond the task descriptions.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, or other computing specifications used for running the experiments.
Software Dependencies No The paper mentions software components and algorithms (e.g., 'REINFORCE', 'backprop', 'Actor-Critic'), and implies the use of a deep learning framework and RL environments, but does not specify any software versions for libraries or programming languages (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes All networks considered have the same architecture: a three-layer network of stochastic units, with the first hidden layer having 64 units, the second hidden layer having 32 units, and the output layer being a softmax layer. All hidden units are Bernoulli-logistic units... For the critic networks in all experiments, we used a three-layer ANN trained by backprop. Other experiments details can be found in Appendix C.