reproducibilityindex.ai

Soft Action Priors: Towards Robust Policy Transfer

Authors: Matheus Centa, Philippe Preux

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform tabular experiments, which show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors. Finally, we demonstrate the robustness of the adaptive algorithms in continuous action deep RL problems, in which adaptive algorithms considerably improved stability when compared to existing policy distillation methods.
Researcher Affiliation	Academia	Matheus Centa1, Philippe Preux 1 1 Univ. Lille, CNRS, UMR 9189 CRISt AL, F-59000 Lille, France Inria, Centrale Lille {matheus.centa, philippe.preux}@inria.fr
Pseudocode	Yes	The pseudocode for the E2R algorithm can be found on Appendix 1.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We study two experimental setups: the tabular Grid World setting from (Czarnecki et al. 2019) and the continuous control benchmarks from Mu Jo Co (Todorov, Erez, and Tassa 2012) using Open AI Gym (Brockman et al. 2016).
Dataset Splits	No	The paper refers to 'evaluation episodes' but does not provide explicit details on train/validation/test dataset splits, percentages, or counts for reproduction.
Hardware Specification	No	The paper does not provide specific details on the hardware used, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions 'Open AI Gym' and 'Mu Jo Co' but does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	We present implementation details and hyperparameter choices in Appendix 3.