Learning from Active Human Involvement through Proxy Value Propagation
Authors: Zhenghao (Mark) Peng, Wenjie Mo, Chenda Duan, Quanyi Li, Bolei Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Human-in-the-loop experiments show the generality and efficiency of our method. |
| Researcher Affiliation | Academia | University of California, Los Angeles, University of Edinburgh |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Demo video and code are available at: https://metadriverse.github.io/pvp. |
| Open Datasets | Yes | We conduct experiments on various control tasks with different observation and action spaces. For continuous action space, we use three driving environments, Meta Drive safety benchmark [25], CARLA Town01 [8], and a customized driving environment built upon Grand Theft Auto V (GTA V), a popular video game. ... For discrete action space, we use Mini Grid Two Room task [4]. |
| Dataset Splits | Yes | In Meta Drive, there exists a split of training and test environments, and we present the performance of the learned agent in a held-out test environment. |
| Hardware Specification | Yes | All experiments with humans are conducted on a local computer with an Nvidia Ge Force RTX 3080. |
| Software Dependencies | No | We implement most of the code with Stable-Baselines3 [37]. |
| Experiment Setup | Yes | Hyper-parameters and other details are given in Appendix E and G. ... Table 7: PVP (Meta Drive) Hyper-parameter Value Discounted Factor γ 0.99 τ for Target Network Update 0.005 Learning Rate 0.0001 Steps before Learning Start 100 Steps per Iteration 1 Gradient Steps per Iteration 1 Train Batch Size 100 Q Value Bound 1 |