reproducibilityindex.ai

Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits

Authors: Tong Mu, Yash Chandak, Tatsunori B. Hashimoto, Emma Brunskill

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also introduce a practical algorithm and demonstrate promising empirical results in environments based on real-world datasets, such as voting outcomes and scene classiﬁcation. We test and analyze our algorithm empirically in three settings, including two derived from real data. For all experiments, we use the practical algorithm discussed in Section 5. While our method works for any policy that can be optimized, we consider learning the parameters of a stochastic softmax linear contextual bandit policy of the form: ... We compare against three baselines:
Researcher Affiliation	Academia	Tong Mu Stanford university tongm@cs.stanford.edu Yash Chandak University of Massachusetts ychandak@cs.umass.edu Tatsunori Hashimoto Stanford University thashim@stanford.edu Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 in the Appendix provides our full algorithm for policy evaluation and optimization, that considers both the discrete binned case (Section 4.2) and the function approximation case just discussed.
Open Source Code	Yes	1Code: https://github.com/Stanford AI4HI/Factored DRO
Open Datasets	Yes	Scene Setting: We additionally test in a setting derived from the multiclass-supervised learning Scene classiﬁcation dataset from the Lib SVM [Chang and Lin, 2011] repository. Voting Setting: The voting dataset by Gerber et al. [2008] contains data collected from a randomized controlled trial-style study...
Dataset Splits	No	Table 1 provides 'ntotal, (ntrain, ntest)' splits for the datasets (e.g., '40K, (20K, 20K)'), indicating train and test sets, but no explicit mention of a separate validation split or how it's handled for hyperparameter tuning.
Hardware Specification	No	Experiments were run on an internal cluster running GPUs. The total amount of compute time was about 200 GPU hours. This statement mentions GPUs but does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other specific hardware components required to precisely replicate the environment.
Software Dependencies	No	The paper does not explicitly state the version numbers for any software dependencies, such as programming languages, libraries (e.g., PyTorch, TensorFlow), or other specific tools used for the experiments.
Experiment Setup	Yes	For the Adam optimizer we use an initial learning rate of 1e-3, with exponential decay rate of 0.999 per epoch and a batch size of 100. For the policy we use a softmax temperature parameter of 1.0. For the approximate Taylor Expansion (Section 5.1), we use a order of 5, and use Adam optimizer with learning rate 1e-4 and batch size 100.