Accelerated Gradient Temporal Difference Learning
Authors: Yangchen Pan, Adam White, Martha White
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain. |
| Researcher Affiliation | Academia | Yangchen Pan, Adam White, Martha White Department of Computer Science Indiana University at Bloomington {yangpan,adamw,martha}@indiana.edu |
| Pseudocode | No | The paper describes the algorithm's update rule and derivation in text and equations, but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions domains like 'Boyan's chain' and 'Mountain car with tile coding' which are well-known benchmarks, but it does not provide specific citations with authors/year, links, or repositories for these or the 'industrial energy allocation simulator' dataset. |
| Dataset Splits | No | The paper discusses parameter sweeping and evaluation over time steps and episodes but does not explicitly provide details on how the data was split into training, validation, and test sets for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We swept a large range of stepsize parameters, trace decay rates, and regularization parameters, and tested both fixed and decaying stepsize schedules. ... In the decayed stepsize case, where αt = α0 / (n0 + episode#), 18 values of α0 and two values of n0 were tested... We set α0 = 1 and set η to a small fixed value. ... We used a fine grain tile coding of the the 2D state, resulting in a 1024 dimensional feature representation with exactly 10 units active on every time step. |