Soft Action Priors: Towards Robust Policy Transfer
Authors: Matheus Centa, Philippe Preux
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform tabular experiments, which show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors. Finally, we demonstrate the robustness of the adaptive algorithms in continuous action deep RL problems, in which adaptive algorithms considerably improved stability when compared to existing policy distillation methods. |
| Researcher Affiliation | Academia | Matheus Centa1, Philippe Preux 1 1 Univ. Lille, CNRS, UMR 9189 CRISt AL, F-59000 Lille, France Inria, Centrale Lille {matheus.centa, philippe.preux}@inria.fr |
| Pseudocode | Yes | The pseudocode for the E2R algorithm can be found on Appendix 1. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We study two experimental setups: the tabular Grid World setting from (Czarnecki et al. 2019) and the continuous control benchmarks from Mu Jo Co (Todorov, Erez, and Tassa 2012) using Open AI Gym (Brockman et al. 2016). |
| Dataset Splits | No | The paper refers to 'evaluation episodes' but does not provide explicit details on train/validation/test dataset splits, percentages, or counts for reproduction. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions 'Open AI Gym' and 'Mu Jo Co' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We present implementation details and hyperparameter choices in Appendix 3. |