Discerning Temporal Difference Learning
Authors: Jianfei Ma
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios. Experiments In this section, we delve into the impact of DTD(λ) s emphasizing effect, whether the emphasis function is predetermined or adapted during training. |
| Researcher Affiliation | Academia | Northwestern Polytechnical University School of Mathematics and Statistics matrixfeeney@gmail.com |
| Pseudocode | Yes | Algorithm 1: DTD(λ) |
| Open Source Code | No | The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | No | The paper mentions '5-state random-walk problem' and '13-state Boyan chain' as environments or benchmark problems, but it does not provide concrete access information (link, DOI, repository, or formal citation for a dataset) for publicly available or open datasets. It refers to the *definition* of these problems, not publicly accessible data. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing data. It mentions '50 independent runs, each spanning 5000 environment steps', which describes experimental runs rather than data splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | No | The paper states 'The experiments are carried out over 50 independent runs, each spanning 5000 environment steps. The depicted curves report the best performance with extensive parameter sweeping.' However, it does not provide specific hyperparameter values, optimizer settings, or detailed training configurations in the main text. |