reproducibilityindex.ai

Discerning Temporal Difference Learning

Authors: Jianfei Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios. Experiments In this section, we delve into the impact of DTD(λ) s emphasizing effect, whether the emphasis function is predetermined or adapted during training.
Researcher Affiliation	Academia	Northwestern Polytechnical University School of Mathematics and Statistics matrixfeeney@gmail.com
Pseudocode	Yes	Algorithm 1: DTD(λ)
Open Source Code	No	The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	No	The paper mentions '5-state random-walk problem' and '13-state Boyan chain' as environments or benchmark problems, but it does not provide concrete access information (link, DOI, repository, or formal citation for a dataset) for publicly available or open datasets. It refers to the definition of these problems, not publicly accessible data.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing data. It mentions '50 independent runs, each spanning 5000 environment steps', which describes experimental runs rather than data splits.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup	No	The paper states 'The experiments are carried out over 50 independent runs, each spanning 5000 environment steps. The depicted curves report the best performance with extensive parameter sweeping.' However, it does not provide specific hyperparameter values, optimizer settings, or detailed training configurations in the main text.