Learning from Interventions Using Hierarchical Policies for Safe Learning

Authors: Jing Bi, Vikas Dhiman, Tianyou Xiao, Chenliang Xu10352-10360

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that Lf I using sub-goals in a hierarchical policy framework trains faster and achieves better asymptotic performance than typical Lf D.
Researcher Affiliation Academia 1University of Rochester 2University of California San Diego
Pseudocode Yes Algorithm 1 Learn-form-Intervention by Backtracking
Open Source Code No The paper mentions including a demo video in the supplementary material but does not explicitly state that the source code for their methodology is publicly available or provide a link.
Open Datasets Yes We use a 3D urban driving simulator CARLA (Dosovitskiy et al. 2017).
Dataset Splits No The paper mentions collecting data for training and evaluating the agent for certain durations ('30 mins recorded data', 'test agent for 15 mins') but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes We equipped an off-the-shelf 1/10 scale (13 10 11 ) truck with an embedded computer (Nvidia TX2), an Intel Real Sense D415 as the primary central camera and two webcams on the sides.
Software Dependencies No The paper mentions using the CARLA simulator and ResNet-50 but does not provide specific version numbers for these or other software dependencies like programming languages, frameworks, or libraries.
Experiment Setup Yes The Equation 8 is minimized with a learning rate of 1e-5 using Adam solver. For each experiment, we use behavior cloning with 30 mins recorded data (~7200 frames) in our first iteration and test agent for 15 mins in each subsequent iteration. We initiate Res Net-50 with pre-trained parameters and only fine-tune the top three stages. We use ELU nonlinearities after all hidden layers and applied 50% dropout after fully-connected hidden layers.