Rating-Based Reinforcement Learning
Authors: Devin White, Mingkang Wu, Ellen Novoseller, Vernon J. Lawhern, Nicholas Waytowich, Yongcan Cao
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We finally conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the performance of the new rating-based reinforcement learning approach. |
| Researcher Affiliation | Collaboration | Devin White1, Mingkang Wu1, Ellen Novoseller2, Vernon J. Lawhern2, Nicholas Waytowich2, Yongcan Cao1 1University of Texas, San Antonio 2DEVCOM Army Research Laboratory |
| Pseudocode | No | The paper describes methods using prose and equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code can be found at https://rb.gy/tdpc4y. |
| Open Datasets | Yes | We study the Walker and Quadruped tasks in Lee et al. (2021), with 1000 and 2000 synthetic queries, respectively. ... We conducted tests on 3 of the Open AI Gym Mu Jo Co Environments also used in Christiano et al. (2017): Swimmer, Hopper and Cheetah. |
| Dataset Splits | No | The paper refers to 'training data samples' and 'test data' but does not specify explicit training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions various algorithms and environments but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We use the same neural network structures for both the reward predictor and control policy and the same hyperparameters as in Lee et al. (2021). ... We used the same neural network structures for both the reward predictor and control policy and the same hyperparameters as in Christiano et al. (2017). |