Soft Action Priors: Towards Robust Policy Transfer

Authors: Matheus Centa, Philippe Preux

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform tabular experiments, which show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors. Finally, we demonstrate the robustness of the adaptive algorithms in continuous action deep RL problems, in which adaptive algorithms considerably improved stability when compared to existing policy distillation methods.
Researcher Affiliation Academia Matheus Centa1, Philippe Preux 1 1 Univ. Lille, CNRS, UMR 9189 CRISt AL, F-59000 Lille, France Inria, Centrale Lille {matheus.centa, philippe.preux}@inria.fr
Pseudocode Yes The pseudocode for the E2R algorithm can be found on Appendix 1.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We study two experimental setups: the tabular Grid World setting from (Czarnecki et al. 2019) and the continuous control benchmarks from Mu Jo Co (Todorov, Erez, and Tassa 2012) using Open AI Gym (Brockman et al. 2016).
Dataset Splits No The paper refers to 'evaluation episodes' but does not provide explicit details on train/validation/test dataset splits, percentages, or counts for reproduction.
Hardware Specification No The paper does not provide specific details on the hardware used, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions 'Open AI Gym' and 'Mu Jo Co' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We present implementation details and hyperparameter choices in Appendix 3.