reproducibilityindex.ai

Backprop-Free Reinforcement Learning with Active Neural Generative Coding

Authors: Alexander G. Ororbia, Ankur Mali29-37

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate on several simple control problems that our framework performs competitively with deep Q-learning. To evaluate ANGC’s efficacy, we implement an agent structure tasked with solving control problems often experimented with in RL and compare performance against several backprop-based approaches. The performance of the ANGC agent is evaluated on three control problems commonly used in reinforcement learning (RL) and one simulation in robotic control.
Researcher Affiliation	Academia	Alexander G. Ororbia1, Ankur Mali2 1 Rochester Institute of Technology 2 The Pennsylvania University ago@cs.rit.edu, aam35@psu.edu
Pseudocode	Yes	Algorithm 1: The ANGC total discrepancy process under an environment for E episodes (of maximum length T).
Open Source Code	No	The paper references an appendix at 'https://arxiv.org/abs/2107.07046' for pseudocode and details, but does not explicitly state that source code for the methodology is released or provide a code repository link.
Open Datasets	No	The paper refers to standard control problems in reinforcement learning (inverted pendulum, mountain car, lunar lander, robot reaching problem) but does not provide concrete access information (links, DOIs, formal citations with authors/year) for them as publicly available datasets.
Dataset Splits	No	The paper discusses training aspects like experience replay and batch sizes but does not specify exact train/validation/test dataset splits, percentages, or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processors, or memory used for running its experiments.
Software Dependencies	No	The paper mentions optimizers like Adam and AdamW but does not provide specific version numbers for programming languages or other software dependencies.
Experiment Setup	Yes	For all ANGC agents, αe = αi = 1.0 was used as the importance factors for both the epistemic and instrumental signals. Each agent also uses an epsilon(ϵ)-greedy policy where ϵ was decayed at the end of each episode according to the rule: ϵ min(0.05, ϵ ϵdecay) (starting ϵ = 1 at a trial s start). The discount factor was tuned in the range of γ = [0.91, 0.99]. The linear rectiﬁer was used as the activation function and Adam was used to update weight values... hidden layer sizes were selected from the range of [32, 512] and the number of layers was chosen from the set [1, 2, 3].