Hindsight PRIORs for Reward Learning from Human Preferences
Authors: Mudit Verma, Katherine Metcalf
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the benefits of Hindsight PRIOR on the Deep Mind Control (DMC) Suite locomotion (Tunyasuvunakool et al., 2020) and Meta World control (Yu et al., 2020) tasks, compare against baselines (Lee et al., 2021a; Park et al., 2022; Liu et al., 2022; Liang et al., 2022), and ablate over Hindsight PRIOR s contributions. |
| Researcher Affiliation | Collaboration | Mudit Verma Arizona State University Tempe, AZ, 85281 muditverma@asu.edu Katherine Metcalf Apple Inc. Cupertino, CA, 95014 kmetcalf@apple.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is open-source or publicly available. |
| Open Datasets | Yes | We evaluate the benefits of Hindsight PRIOR on the Deep Mind Control (DMC) Suite locomotion (Tunyasuvunakool et al., 2020) and Meta World control (Yu et al., 2020) tasks |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing. |
| Hardware Specification | No | The paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Python' and optimizers like 'Adam', but does not provide specific version numbers for these or any other ancillary software dependencies. |
| Experiment Setup | Yes | Table 2: Training hyper-parameters for SAC (Haarnoja et al., 2018). Table 3: PEBBLE hyper-parameters (Lee et al., 2021a). Table 4: Hindsight PRIOR hyper-parameters. Table 5: World Model hyper-parameters in Hindsight PRIOR |