Backprop-Free Reinforcement Learning with Active Neural Generative Coding
Authors: Alexander G. Ororbia, Ankur Mali29-37
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate on several simple control problems that our framework performs competitively with deep Q-learning. To evaluate ANGC’s efficacy, we implement an agent structure tasked with solving control problems often experimented with in RL and compare performance against several backprop-based approaches. The performance of the ANGC agent is evaluated on three control problems commonly used in reinforcement learning (RL) and one simulation in robotic control. |
| Researcher Affiliation | Academia | Alexander G. Ororbia1, Ankur Mali2 1 Rochester Institute of Technology 2 The Pennsylvania University ago@cs.rit.edu, aam35@psu.edu |
| Pseudocode | Yes | Algorithm 1: The ANGC total discrepancy process under an environment for E episodes (of maximum length T). |
| Open Source Code | No | The paper references an appendix at 'https://arxiv.org/abs/2107.07046' for pseudocode and details, but does not explicitly state that source code for the methodology is released or provide a code repository link. |
| Open Datasets | No | The paper refers to standard control problems in reinforcement learning (inverted pendulum, mountain car, lunar lander, robot reaching problem) but does not provide concrete access information (links, DOIs, formal citations with authors/year) for them as publicly available datasets. |
| Dataset Splits | No | The paper discusses training aspects like experience replay and batch sizes but does not specify exact train/validation/test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processors, or memory used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam and AdamW but does not provide specific version numbers for programming languages or other software dependencies. |
| Experiment Setup | Yes | For all ANGC agents, αe = αi = 1.0 was used as the importance factors for both the epistemic and instrumental signals. Each agent also uses an epsilon(ϵ)-greedy policy where ϵ was decayed at the end of each episode according to the rule: ϵ min(0.05, ϵ ϵdecay) (starting ϵ = 1 at a trial s start). The discount factor was tuned in the range of γ = [0.91, 0.99]. The linear rectifier was used as the activation function and Adam was used to update weight values... hidden layer sizes were selected from the range of [32, 512] and the number of layers was chosen from the set [1, 2, 3]. |