reproducibilityindex.ai

Adversarially Guided Actor-Critic

Authors: Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks.
Researcher Affiliation	Collaboration	Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS yannis.flet-berliac@inria.fr Johan Ferret Google Research, Brain team Inria, Scool team Univ. Lille, CRISt AL, CNRS Olivier Pietquin Google Research, Brain team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS Matthieu Geist Google Research, Brain team
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The code for our method is released at github.com/yﬂetberliac/adversarially-guided-actor-critic.
Open Datasets	Yes	In Viz Doom (Kempka et al., 2016), the agent must learn to move along corridors and through rooms without any reward feedback from the 3-D environment. The Mini Grid environments (Chevalier-Boisvert et al., 2018) are a set of challenging partially-observable and sparse-reward gridworlds. All considered environments (see Fig. 1 for some examples) are available as part of Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper describes training and evaluation on various RL environments (Viz Doom, Mini Grid) which are procedurally generated, meaning new environment instances are sampled. It does not provide explicit dataset splits (e.g., percentages or counts) in the traditional supervised learning sense for train/validation/test subsets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions PPO as the base algorithm and Adam for optimization, but does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow) used in the implementation.
Experiment Setup	Yes	Table 3: Hyperparameters used in AGAC. Parameter Value Horizon T 2048 Nb. epochs 4 Nb. minibatches 8 Nb. frames stacked 4 Nonlinearity ELU (Clevert et al., 2016) Discount γ 0.99 GAE parameter λ 0.95 PPO clipping parameter ϵ 0.2 βV 0.5 c 4 10 4 (4 10 5 in Viz Doom) c anneal schedule linear βadv 4 10 5 Adam stepsize η1 3 10 4 Adam stepsize η2 9 10 5 = 0.3 η1