Adversarially Guided Actor-Critic

Authors: Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks.
Researcher Affiliation Collaboration Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS yannis.flet-berliac@inria.fr Johan Ferret Google Research, Brain team Inria, Scool team Univ. Lille, CRISt AL, CNRS Olivier Pietquin Google Research, Brain team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS Matthieu Geist Google Research, Brain team
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The code for our method is released at github.com/yfletberliac/adversarially-guided-actor-critic.
Open Datasets Yes In Viz Doom (Kempka et al., 2016), the agent must learn to move along corridors and through rooms without any reward feedback from the 3-D environment. The Mini Grid environments (Chevalier-Boisvert et al., 2018) are a set of challenging partially-observable and sparse-reward gridworlds. All considered environments (see Fig. 1 for some examples) are available as part of Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper describes training and evaluation on various RL environments (Viz Doom, Mini Grid) which are procedurally generated, meaning new environment instances are sampled. It does not provide explicit dataset splits (e.g., percentages or counts) in the traditional supervised learning sense for train/validation/test subsets.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions PPO as the base algorithm and Adam for optimization, but does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow) used in the implementation.
Experiment Setup Yes Table 3: Hyperparameters used in AGAC. Parameter Value Horizon T 2048 Nb. epochs 4 Nb. minibatches 8 Nb. frames stacked 4 Nonlinearity ELU (Clevert et al., 2016) Discount γ 0.99 GAE parameter λ 0.95 PPO clipping parameter ϵ 0.2 βV 0.5 c 4 10 4 (4 10 5 in Viz Doom) c anneal schedule linear βadv 4 10 5 Adam stepsize η1 3 10 4 Adam stepsize η2 9 10 5 = 0.3 η1