Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards
Authors: Semih Cayci, Atilla Eryilmaz
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical results with numerical experiments. ... In this section, we present numerical results for Robust TD learning and its non-robust counterpart. |
| Researcher Affiliation | Academia | Semih Cayci Department of Mathematics RWTH Aachen University Aachen, Germany EMAIL Atilla Eryilmaz Department of Electrical and Computer Engineering The Ohio State University Columbus, OH 43210 EMAIL |
| Pseudocode | Yes | Algorithm 1: Robust TD learning |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | In the first example, we consider a randomly-generated MRP with |X| = 256. The transition kernel is randomly generated such that P(x, x ) iid Unif(0, 1), and rowwise normalized to obtain a stochastic matrix. ... In this example, we consider a circular random walk for X = {1, 2, . . . , 256}... |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In order to predict the value function, we use (projected) TD learning (see [4]) with linear function approximation based on Gaussian features of dimension d = 4 and projection radius ρ = 30. The discount factor is γ = 0.9, and the reward is Rt(Xt) = r(Xt) + Nt E[Nt] with Nt iid Pareto(1, 1.4) for any t. ... Mean squared error (2) under Robust TD learning and TD learning with the clipping radius bt = t and diminishing step-size ηt = 1 λmin(1 γ)t in Theorem 1 and projection radius ρ = 30 are shown in Figure 2. |