reproducibilityindex.ai

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

Authors: Naman Saxena, Subhojyoti Khastagir, Shishir Kolathaya, Shalabh Bhatnagar

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to stateof-the-art on-policy average reward actor-critic algorithms over Mu Jo Co-based environments.
Researcher Affiliation	Academia	1Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India 2Robert Bosch Centre for Cyber-Physical Systems, Indian Institute of Science, Bangalore, India. Correspondence to: Naman Saxena <namansaxena@iisc.ac.in>.
Pseudocode	Yes	Algorithm 1 (Off-Policy) ARO-DDPG Practical Algorithm; Algorithm 2 On-policy AR-DPG with Linear FA; Algorithm 3 Off-policy AR-DPG with Linear FA; Algorithm 4 On-policy AR-DPG with Linear FA; Algorithm 5 Off-policy AR-DPG with Linear FA
Open Source Code	Yes	Pytorch implementation of ARO-DDPG could be found at this URL: https://github.com/namansaxena9/ARODDPG
Open Datasets	Yes	We conducted experiments on six different environments using the Deep Mind control suite (Tassa et al., 2018)
Dataset Splits	No	The paper discusses training and evaluation phases with specific episode lengths, but does not provide explicit dataset splits for training, validation, or testing, nor does it refer to standard predefined splits for the DeepMind Control Suite.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'Pytorch implementation' and 'Mu Jo Co-based environments' but does not specify version numbers for these or other software libraries, which is necessary for reproducibility.
Experiment Setup	Yes	The paper includes a 'Hyperparameter' table detailing specific values for Buffer Size, Total Environment Steps, Batch size, Evaluation Frequency, Training Episode Length, Evaluation Episode Length, Activation Function, Learning rate (Actor, Differential Q-value function, Average reward parameter), No. of Hidden Layers, No. of Nodes in Hidden Layer, Update frequency, No. of Critic updates, No. of Actor updates, and Polyak averaging constant.