Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Variance-aware decision making with linear function approximation under heavy-tailed rewards

Authors: Xiang Li, Qiang Sun

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this subsection, we conduct a numerical comparison between Ada OFUL and two baseline algorithms: original OFUL (Abbasi-Yadkori et al., 2011) and TOFU (Shao et al., 2018). ... Experiment results Figure 1 shows the regret and convergence results across three noise cases.
Researcher Affiliation	Academia	Xiang Li EMAIL School of Mathematical Sciences Peking University. Qiang Sun EMAIL Department of Statistical Sciences University of Toronto.
Pseudocode	Yes	Algorithm 1 Adaptive Huber regression based OFUL (Ada OFUL). Algorithm 2 The VARA algorithm (informal). Algorithm 3 The VARA algorithm (formal)
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodologies is publicly available.
Open Datasets	No	Experiment setup We experiment with the following configuration. ... Rewards are generated by yt = xϕt, θy + εt with εt being an independent zero-mean noise. We investigate three noise types: Case (a) is Gaussian distribution εt ∼ N(0, 1) , while Cases (b) and (c) correspond to Student t-distributions εt ∼ t(df) with df, the degree of freedom, varying.
Dataset Splits	No	The paper generates synthetic data for its experiments but does not describe conventional dataset splits (e.g., train/test/validation) in the context of pre-existing datasets. Instead, it defines how the data (rewards and noise) is generated for each step of the online process.
Hardware Specification	No	The paper describes the experimental setup for a numerical study but does not specify any particular hardware (e.g., GPU, CPU models, or server specifications) used for running the experiments.
Software Dependencies	No	The paper mentions hyperparameters and an experiment setup in its numerical study but does not provide specific software dependencies or their version numbers (e.g., programming languages, libraries, or frameworks with versions).
Experiment Setup	Yes	Experiment setup We experiment with the following configuration. We set d = 10 and \|Dt\| = 20. The optimal θ is generated by randomly sampling each coordinate from a uniform distribution Up(0, 1) and normalizing the resultant vector to unit length so that B = 1. ... Rewards are generated by yt = xϕt, θy + εt with εt being an independent zero-mean noise. ... Hyperparameters were chosen based on observations from the initial couple of steps so that τ0 = √d and c0 = c1 = 1. The experiment runs for T = 1000 steps and is replicated 10 times, with the outcomes averaged.