Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Variance-aware decision making with linear function approximation under heavy-tailed rewards

Authors: Xiang Li, Qiang Sun

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this subsection, we conduct a numerical comparison between Ada OFUL and two baseline algorithms: original OFUL (Abbasi-Yadkori et al., 2011) and TOFU (Shao et al., 2018). ... Experiment results Figure 1 shows the regret and convergence results across three noise cases.
Researcher Affiliation Academia Xiang Li EMAIL School of Mathematical Sciences Peking University. Qiang Sun EMAIL Department of Statistical Sciences University of Toronto.
Pseudocode Yes Algorithm 1 Adaptive Huber regression based OFUL (Ada OFUL). Algorithm 2 The VARA algorithm (informal). Algorithm 3 The VARA algorithm (formal)
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodologies is publicly available.
Open Datasets No Experiment setup We experiment with the following configuration. ... Rewards are generated by yt = xϕt, θy + εt with εt being an independent zero-mean noise. We investigate three noise types: Case (a) is Gaussian distribution εt ∼ N(0, 1) , while Cases (b) and (c) correspond to Student t-distributions εt ∼ t(df) with df, the degree of freedom, varying.
Dataset Splits No The paper generates synthetic data for its experiments but does not describe conventional dataset splits (e.g., train/test/validation) in the context of pre-existing datasets. Instead, it defines how the data (rewards and noise) is generated for each step of the online process.
Hardware Specification No The paper describes the experimental setup for a numerical study but does not specify any particular hardware (e.g., GPU, CPU models, or server specifications) used for running the experiments.
Software Dependencies No The paper mentions hyperparameters and an experiment setup in its numerical study but does not provide specific software dependencies or their version numbers (e.g., programming languages, libraries, or frameworks with versions).
Experiment Setup Yes Experiment setup We experiment with the following configuration. We set d = 10 and |Dt| = 20. The optimal θ is generated by randomly sampling each coordinate from a uniform distribution Up(0, 1) and normalizing the resultant vector to unit length so that B = 1. ... Rewards are generated by yt = xϕt, θy + εt with εt being an independent zero-mean noise. ... Hyperparameters were chosen based on observations from the initial couple of steps so that τ0 = √d and c0 = c1 = 1. The experiment runs for T = 1000 steps and is replicated 10 times, with the outcomes averaged.