Approximate Value Equivalence
Authors: Christopher Grimm, Andre Barreto, Satinder Singh
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In contrast to previous works, we show empirically that there are situations where agents with limited capacity should prefer to learn more accurate models with respect to smaller sets of functions over less accurate models with respect to larger sets of functions. ... To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] and learn tabular VE models whose per-action transition matrices are constrained to have rank at most R. ... Figure 2 shows a histogram of the planning performance of such models. Each cell in the figure corresponds to a model associated with specific values of R and D, and the cells color denotes the value of the model s optimal policy averaged over states and over 10 independent executions. |
| Researcher Affiliation | Collaboration | Christopher Grimm Computer Science & Engineering University of Michigan crgrimm@umich.edu André Barreto Deep Mind andrebarreto@deepmind.com Satinder Singh Deep Mind baveja@deepmind.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper states in section 3.a of the ethics checklist: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]'. There is no other explicit statement about open-sourcing code for the methodology. |
| Open Datasets | Yes | To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] |
| Dataset Splits | No | The paper does not explicitly state training, validation, or test dataset splits. The ethics checklist 3.b indicates N/A for training details including data splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. The ethics checklist 3.d indicates N/A for compute resources. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. The ethics checklist 3.d indicates N/A for compute resources, which often includes software details. |
| Experiment Setup | Yes | To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] and learn tabular VE models whose per-action transition matrices are constrained to have rank at most R. We learn these models to be in the VE class M(Π, V), where V is a set of D functions generated by sampling v(s) Uniform( 10, 10) for each v V and each s S. ... Each cell in the figure corresponds to a model associated with specific values of R and D, and the cells color denotes the value of the model s optimal policy averaged over states and over 10 independent executions. |