Incremental Truncated LSTD

Authors: Clement Gehring, Yangchen Pan, Martha White

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that the algorithm effectively balances computational complexity and sample efficiency for policy evaluation in a benchmark task and a high-dimensional energy allocation domain.
Researcher Affiliation Academia Clement Gehring MIT CSAIL Cambridge MA 02139 USA gehring@csail.mit.edu Yangchen Pan and Martha White Indiana University Bloomington IN 47405 USA {yangpan, martha}@indiana.edu
Pseudocode Yes Algorithm 1 t-LSTD(λ) using incremental SVD
Open Source Code No Due to space constraints, the detailed pseudo-code for the SVD updates are left out but detailed code and explanations will be published on-line.
Open Datasets Yes We first investigate the performance of t-LSTD in the Mountain Car benchmark. ... In this section, we demonstrate the performance of the fully incremental algorithm (k = 1) in a large energy allocation domain [Salas and Powell, 2013].
Dataset Splits No The paper does not provide specific training, validation, and test dataset splits with percentages or sample counts. It describes evaluation using 'rollouts' to estimate true values for comparison, which is typical for reinforcement learning environments rather than static dataset splits.
Hardware Specification No The paper mentions 'CPU time' in relation to computational costs but does not specify any particular hardware components like CPU models, GPU models, or memory used for the experiments.
Software Dependencies No The paper does not specify any software dependencies or libraries with version numbers (e.g., Python version, specific machine learning frameworks like TensorFlow or PyTorch versions, or numerical libraries).
Experiment Setup Yes The tile coding representation has 1000 features, using 10 layers of 10x10 grids. The RBF representation has 1024 features, for a grid of 32x32 RBFs with width equal to 0.12 times the total range of the state space. We set the RBF widths to obtain good performance from LSTD. The other parameters (λ and step-size) are optimized for each algorithm. In the Mountain Car results, we use the mini-batch case where k = r and a discount γ = 0.99. ... To approximate the value function, we use tile coding with 32 tilings where each tiling contains 5 × 5 × 10 × 5 grids, resulting in 40,000 features and also included a bias unit. We set γ = 0.8. ... We sweep the additional parameters in the other algorithms, including step-sizes for TD and i LSTD and m for i LSTD. We sweep a range of α0 = {2−11, 2−10, 2−9, ..., 2−1}, and divide by the number of active features (which in this case is 26). Further, because i LSTD is unstable unless α is decayed, we further sweep the decay formula as suggested by Geramifard and Bowling [2006] where N0 is chosen from {10, 100, 1000}. To focus parameter sweeps on the step-size, which had much more effect for i LSTD, we set λ = 0.9 for all other algorithms, except for t LSTD which we set λ = 1.0. We choose r ∈ {5, 20, 40, 60} and m ∈ {10, 20, 30, 40, 50}.