Discerning Temporal Difference Learning

Authors: Jianfei Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios. Experiments In this section, we delve into the impact of DTD(λ) s emphasizing effect, whether the emphasis function is predetermined or adapted during training.
Researcher Affiliation Academia Northwestern Polytechnical University School of Mathematics and Statistics matrixfeeney@gmail.com
Pseudocode Yes Algorithm 1: DTD(λ)
Open Source Code No The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets No The paper mentions '5-state random-walk problem' and '13-state Boyan chain' as environments or benchmark problems, but it does not provide concrete access information (link, DOI, repository, or formal citation for a dataset) for publicly available or open datasets. It refers to the *definition* of these problems, not publicly accessible data.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing data. It mentions '50 independent runs, each spanning 5000 environment steps', which describes experimental runs rather than data splits.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup No The paper states 'The experiments are carried out over 50 independent runs, each spanning 5000 environment steps. The depicted curves report the best performance with extensive parameter sweeping.' However, it does not provide specific hyperparameter values, optimizer settings, or detailed training configurations in the main text.