reproducibilityindex.ai

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

Authors: Semih Cayci, Atilla Eryilmaz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We corroborate our theoretical results with numerical experiments. ... In this section, we present numerical results for Robust TD learning and its non-robust counterpart.
Researcher Affiliation	Academia	Semih Cayci Department of Mathematics RWTH Aachen University Aachen, Germany cayci@mathc.rwth-aachen.de Atilla Eryilmaz Department of Electrical and Computer Engineering The Ohio State University Columbus, OH 43210 eryilmaz.2@osu.edu
Pseudocode	Yes	Algorithm 1: Robust TD learning
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	No	In the first example, we consider a randomly-generated MRP with \|X\| = 256. The transition kernel is randomly generated such that P(x, x ) iid Unif(0, 1), and rowwise normalized to obtain a stochastic matrix. ... In this example, we consider a circular random walk for X = {1, 2, . . . , 256}...
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	In order to predict the value function, we use (projected) TD learning (see [4]) with linear function approximation based on Gaussian features of dimension d = 4 and projection radius ρ = 30. The discount factor is γ = 0.9, and the reward is Rt(Xt) = r(Xt) + Nt E[Nt] with Nt iid Pareto(1, 1.4) for any t. ... Mean squared error (2) under Robust TD learning and TD learning with the clipping radius bt = t and diminishing step-size ηt = 1 λmin(1 γ)t in Theorem 1 and projection radius ρ = 30 are shown in Figure 2.