Diverse Exploration for Fast and Safe Policy Improvement
Authors: Andrew Cohen, Lei Yu, Robert Wright
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical study shows that an online policy improvement algorithm framework implementing the DE strategy can achieve both fast policy improvement and safe online performance. |
| Researcher Affiliation | Collaboration | Andrew Cohen Binghamton University acohen13@binghamton.edu Lei Yu Binghamton University Yantai University lyu@binghamton.edu Robert Wright Assured Information Security wrightr@ainfosec.com |
| Pseudocode | Yes | Algorithm 1 provides the overall DE framework. |
| Open Source Code | No | The paper does not provide any statement about releasing its source code or include a link to a code repository. |
| Open Datasets | Yes | We use three RL benchmark domains in our analysis: an extended Grid World as described earlier and the classic control domains of Mountain Car and Acrobot (Sutton and Barto 1998). |
| Dataset Splits | Yes | So, we maintain separate training and test sets Dtrain, Dtest by partitioning the trajectories collected from each behavior policy πi based on a predetermined ratio (1/5, 4/5) and appending to Dtrain and Dtest. |
| Hardware Specification | No | The paper does not specify any hardware details like CPU or GPU models used for the experiments. It only vaguely mentions 'computing support' in the acknowledgements. |
| Software Dependencies | No | The paper mentions using 'CMA-ES', 'FQI', and 'Fourier basis functions' with citations, but does not provide specific version numbers for these tools or any other software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Following (Thomas, Theocharous, and Ghavamzadeh 2015b), we set δ = .05 for all experiments. Candidate policies are generated as mixed policies... In experiments, we use α = .3 for Gridworld and α = .9 for Mountain Car/Acrobot. |