Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Approximate Value Equivalence
Authors: Christopher Grimm, Andre Barreto, Satinder Singh
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In contrast to previous works, we show empirically that there are situations where agents with limited capacity should prefer to learn more accurate models with respect to smaller sets of functions over less accurate models with respect to larger sets of functions. ... To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] and learn tabular VE models whose per-action transition matrices are constrained to have rank at most R. ... Figure 2 shows a histogram of the planning performance of such models. Each cell in the figure corresponds to a model associated with specific values of R and D, and the cells color denotes the value of the model s optimal policy averaged over states and over 10 independent executions. |
| Researcher Affiliation | Collaboration | Christopher Grimm Computer Science & Engineering University of Michigan EMAIL Andrรฉ Barreto Deep Mind EMAIL Satinder Singh Deep Mind EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states in section 3.a of the ethics checklist: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]'. There is no other explicit statement about open-sourcing code for the methodology. |
| Open Datasets | Yes | To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] |
| Dataset Splits | No | The paper does not explicitly state training, validation, or test dataset splits. The ethics checklist 3.b indicates N/A for training details including data splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. The ethics checklist 3.d indicates N/A for compute resources. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. The ethics checklist 3.d indicates N/A for compute resources, which often includes software details. |
| Experiment Setup | Yes | To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] and learn tabular VE models whose per-action transition matrices are constrained to have rank at most R. We learn these models to be in the VE class M(ฮ , V), where V is a set of D functions generated by sampling v(s) Uniform( 10, 10) for each v V and each s S. ... Each cell in the figure corresponds to a model associated with specific values of R and D, and the cells color denotes the value of the model s optimal policy averaged over states and over 10 independent executions. |