Learning from Interventions Using Hierarchical Policies for Safe Learning
Authors: Jing Bi, Vikas Dhiman, Tianyou Xiao, Chenliang Xu10352-10360
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Lf I using sub-goals in a hierarchical policy framework trains faster and achieves better asymptotic performance than typical Lf D. |
| Researcher Affiliation | Academia | 1University of Rochester 2University of California San Diego |
| Pseudocode | Yes | Algorithm 1 Learn-form-Intervention by Backtracking |
| Open Source Code | No | The paper mentions including a demo video in the supplementary material but does not explicitly state that the source code for their methodology is publicly available or provide a link. |
| Open Datasets | Yes | We use a 3D urban driving simulator CARLA (Dosovitskiy et al. 2017). |
| Dataset Splits | No | The paper mentions collecting data for training and evaluating the agent for certain durations ('30 mins recorded data', 'test agent for 15 mins') but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | We equipped an off-the-shelf 1/10 scale (13 10 11 ) truck with an embedded computer (Nvidia TX2), an Intel Real Sense D415 as the primary central camera and two webcams on the sides. |
| Software Dependencies | No | The paper mentions using the CARLA simulator and ResNet-50 but does not provide specific version numbers for these or other software dependencies like programming languages, frameworks, or libraries. |
| Experiment Setup | Yes | The Equation 8 is minimized with a learning rate of 1e-5 using Adam solver. For each experiment, we use behavior cloning with 30 mins recorded data (~7200 frames) in our first iteration and test agent for 15 mins in each subsequent iteration. We initiate Res Net-50 with pre-trained parameters and only fine-tune the top three stages. We use ELU nonlinearities after all hidden layers and applied 50% dropout after fully-connected hidden layers. |