Learning Safe Policies with Expert Guidance
Authors: Jessie Huang, Fa Wu, Doina Precup, Yang Cai
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. |
| Researcher Affiliation | Collaboration | Jessie Huang1 Fa Wu12 Doina Precup1 Yang Cai1 1School of Computer Science, Mc Gill University 2Zhejiang Demetics Medical Technology |
| Pseudocode | Yes | Algorithm 1 Separation Oracle SOR for the reward polytope PR; Algorithm 2 Separation Oracle for the feasible (µ, z) in LP 1; Algorithm 3 FPL Maxmin Learning |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to code repositories for the described methodology. |
| Open Datasets | Yes | Our next experiments are based on the classic control task of cartpole and the environment provided by Open AI Gym [6]. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts. It describes using a 'small (10x10) demonstration gridworld' for expert policy generation and then testing in a 'much larger size (50x50)', but no specific splits are given. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or specific computing environments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In the following experiment we set = 0.5 which defines PR and captures how close to optimal the expert is. |