reproducibilityindex.ai

Hindsight PRIORs for Reward Learning from Human Preferences

Authors: Mudit Verma, Katherine Metcalf

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the benefits of Hindsight PRIOR on the Deep Mind Control (DMC) Suite locomotion (Tunyasuvunakool et al., 2020) and Meta World control (Yu et al., 2020) tasks, compare against baselines (Lee et al., 2021a; Park et al., 2022; Liu et al., 2022; Liang et al., 2022), and ablate over Hindsight PRIOR s contributions.
Researcher Affiliation	Collaboration	Mudit Verma Arizona State University Tempe, AZ, 85281 muditverma@asu.edu Katherine Metcalf Apple Inc. Cupertino, CA, 95014 kmetcalf@apple.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets	Yes	We evaluate the benefits of Hindsight PRIOR on the Deep Mind Control (DMC) Suite locomotion (Tunyasuvunakool et al., 2020) and Meta World control (Yu et al., 2020) tasks
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification	No	The paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'Python' and optimizers like 'Adam', but does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup	Yes	Table 2: Training hyper-parameters for SAC (Haarnoja et al., 2018). Table 3: PEBBLE hyper-parameters (Lee et al., 2021a). Table 4: Hindsight PRIOR hyper-parameters. Table 5: World Model hyper-parameters in Hindsight PRIOR