reproducibilityindex.ai

Robust Tests in Online Decision-Making

Authors: Gi-Soo Kim, Jane P Kim, Hyun-Joon Yang10016-10024

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose a modiﬁed actor-critic algorithm which is robust to critic misspeciﬁcation and derive a novel testing procedure for the actor parameters in this case. We conduct experiments on synthetic data and real data and show that our testing procedure appropriately assess the signiﬁcance of the parameters.
Researcher Affiliation	Academia	Gi-Soo Kim1, Jane P. Kim2, Hyun-Joon Yang2 1Department of Industrial Engineering & Artiﬁcial Intelligence Graduate School, UNIST 2Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine
Pseudocode	Yes	Algorithm 2: Actor-Improper Critic algorithm
Open Source Code	No	The paper does not provide any explicit statements about making the source code available or include a link to a code repository.
Open Datasets	No	The paper mentions generating synthetic data and using the 'Recovery Record Dataset' but does not provide access information (link, DOI, formal citation for public access) for either. For the Recovery Record Dataset, it states: 'The Recovery Record Dataset contained patients adherence behaviors to their therapy for eating disorders (daily meal monitoring) and interactions with their linked clinicians on the app.'
Dataset Splits	No	The paper discusses synthetic data and a real-world dataset but does not explicitly specify how these datasets were split into training, validation, or test sets. It mentions '30 bootstrap samples' for evaluation in the data application section, but this does not describe a standard data split.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions implementing algorithms and using methods, but it does not specify any software names with version numbers.
Experiment Setup	Yes	We set N = 2 and d = 4. We generate the context vectors bt,i from a multivariate normal distribution N(0d, Id d) and truncate them to have L2-norm 1. We generate the reward from a model nonlinear in bt,i, rt,i = b T t,iµ max(b T t,1µ, b T t,2µ) + ηt,i where µ = ( 0.577, 0.577, 0.577, 0)T and ηt,i is generated from N(0, 0.012) independently over arms and time. We set the exploration parameter λ in the AC and Proposed algorithms to 0.001. We run the bandit algorithms until time horizon T = 50 with 100 repetitions. We repeated the evaluations on 30 bootstrap samples.