Rating-Based Reinforcement Learning

Authors: Devin White, Mingkang Wu, Ellen Novoseller, Vernon J. Lawhern, Nicholas Waytowich, Yongcan Cao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We finally conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the performance of the new rating-based reinforcement learning approach.
Researcher Affiliation Collaboration Devin White1, Mingkang Wu1, Ellen Novoseller2, Vernon J. Lawhern2, Nicholas Waytowich2, Yongcan Cao1 1University of Texas, San Antonio 2DEVCOM Army Research Laboratory
Pseudocode No The paper describes methods using prose and equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code can be found at https://rb.gy/tdpc4y.
Open Datasets Yes We study the Walker and Quadruped tasks in Lee et al. (2021), with 1000 and 2000 synthetic queries, respectively. ... We conducted tests on 3 of the Open AI Gym Mu Jo Co Environments also used in Christiano et al. (2017): Swimmer, Hopper and Cheetah.
Dataset Splits No The paper refers to 'training data samples' and 'test data' but does not specify explicit training, validation, and test dataset splits with percentages or counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions various algorithms and environments but does not specify any software dependencies with version numbers.
Experiment Setup Yes We use the same neural network structures for both the reward predictor and control policy and the same hyperparameters as in Lee et al. (2021). ... We used the same neural network structures for both the reward predictor and control policy and the same hyperparameters as in Christiano et al. (2017).