Adversarially Guided Actor-Critic
Authors: Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks. |
| Researcher Affiliation | Collaboration | Yannis Flet-Berliac Inria, Scool team Univ. Lille, CRISt AL, CNRS yannis.flet-berliac@inria.fr Johan Ferret Google Research, Brain team Inria, Scool team Univ. Lille, CRISt AL, CNRS Olivier Pietquin Google Research, Brain team Philippe Preux Inria, Scool team Univ. Lille, CRISt AL, CNRS Matthieu Geist Google Research, Brain team |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | The code for our method is released at github.com/yfletberliac/adversarially-guided-actor-critic. |
| Open Datasets | Yes | In Viz Doom (Kempka et al., 2016), the agent must learn to move along corridors and through rooms without any reward feedback from the 3-D environment. The Mini Grid environments (Chevalier-Boisvert et al., 2018) are a set of challenging partially-observable and sparse-reward gridworlds. All considered environments (see Fig. 1 for some examples) are available as part of Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper describes training and evaluation on various RL environments (Viz Doom, Mini Grid) which are procedurally generated, meaning new environment instances are sampled. It does not provide explicit dataset splits (e.g., percentages or counts) in the traditional supervised learning sense for train/validation/test subsets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions PPO as the base algorithm and Adam for optimization, but does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow) used in the implementation. |
| Experiment Setup | Yes | Table 3: Hyperparameters used in AGAC. Parameter Value Horizon T 2048 Nb. epochs 4 Nb. minibatches 8 Nb. frames stacked 4 Nonlinearity ELU (Clevert et al., 2016) Discount γ 0.99 GAE parameter λ 0.95 PPO clipping parameter ϵ 0.2 βV 0.5 c 4 10 4 (4 10 5 in Viz Doom) c anneal schedule linear βadv 4 10 5 Adam stepsize η1 3 10 4 Adam stepsize η2 9 10 5 = 0.3 η1 |