Reward-rational (implicit) choice: A unifying formalism for reward learning
Authors: Hong Jun Jeon, Smitha Milli, Anca Dragan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Appendix C, we describe experiments with active selection of feedback types. In the environments we tested, we found that demonstrations are optimal early on, when little is known about the reward, while comparisons became optimal later, as a way to fine-tune the reward. |
| Researcher Affiliation | Academia | Hong Jun Jeon 1, Smitha Milli 2, Anca Dragan2 hjjeon@stanford.edu, smilli@berkeley.edu, anca@berkeley.edu Equal contribution, 1Stanford University, 2University of California, Berkeley |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions a 'grid world navigation task' for illustration and refers to experiments in appendices, but it does not specify or provide access information for any publicly available or open datasets used in those experiments. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., 'Python 3.8, PyTorch 1.9') needed to replicate the experiments. |
| Experiment Setup | No | The paper discusses the overall approach and theoretical framework, with reference to experiments in appendices, but it does not include specific experimental setup details such as hyperparameter values or system-level training configurations in the main text. |