Learning by Competition of Self-Interested Reinforcement Learning Agents
Authors: Stephen Chung6384-6393
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that a network trained with Weight Maximization can learn significantly faster than REINFORCE and slightly slower than backpropagation. |
| Researcher Affiliation | Academia | Stephen Chung Department of Computer Science, University of Massachusetts Amherst, USA minghaychung@umass.edu |
| Pseudocode | Yes | The pseudo-code can be found in Algorithm 1 of Appendix B. |
| Open Source Code | Yes | The code is available at https://github. com/stephen-chung-mh/weight max. |
| Open Datasets | Yes | We applied our algorithms to four RL tasks: multiplexer, Cart Pole, Acrobot, and Lunar Lander. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). For RL tasks, data is typically generated through environment interaction rather than static splits, and no such details are provided for reproducibility beyond the task descriptions. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, or other computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and algorithms (e.g., 'REINFORCE', 'backprop', 'Actor-Critic'), and implies the use of a deep learning framework and RL environments, but does not specify any software versions for libraries or programming languages (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | All networks considered have the same architecture: a three-layer network of stochastic units, with the first hidden layer having 64 units, the second hidden layer having 32 units, and the output layer being a softmax layer. All hidden units are Bernoulli-logistic units... For the critic networks in all experiments, we used a three-layer ANN trained by backprop. Other experiments details can be found in Appendix C. |