Approximate Value Equivalence

Authors: Christopher Grimm, Andre Barreto, Satinder Singh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In contrast to previous works, we show empirically that there are situations where agents with limited capacity should prefer to learn more accurate models with respect to smaller sets of functions over less accurate models with respect to larger sets of functions. ... To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] and learn tabular VE models whose per-action transition matrices are constrained to have rank at most R. ... Figure 2 shows a histogram of the planning performance of such models. Each cell in the figure corresponds to a model associated with specific values of R and D, and the cells color denotes the value of the model s optimal policy averaged over states and over 10 independent executions.
Researcher Affiliation Collaboration Christopher Grimm Computer Science & Engineering University of Michigan crgrimm@umich.edu André Barreto Deep Mind andrebarreto@deepmind.com Satinder Singh Deep Mind baveja@deepmind.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper states in section 3.a of the ethics checklist: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]'. There is no other explicit statement about open-sourcing code for the methodology.
Open Datasets Yes To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999]
Dataset Splits No The paper does not explicitly state training, validation, or test dataset splits. The ethics checklist 3.b indicates N/A for training details including data splits.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments. The ethics checklist 3.d indicates N/A for compute resources.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies. The ethics checklist 3.d indicates N/A for compute resources, which often includes software details.
Experiment Setup Yes To illustrate situations where this might occur, we consider the tabular Four Rooms domain [Sutton et al., 1999] and learn tabular VE models whose per-action transition matrices are constrained to have rank at most R. We learn these models to be in the VE class M(Π, V), where V is a set of D functions generated by sampling v(s) Uniform( 10, 10) for each v V and each s S. ... Each cell in the figure corresponds to a model associated with specific values of R and D, and the cells color denotes the value of the model s optimal policy averaged over states and over 10 independent executions.