Tractable Optimality in Episodic Latent MABs

Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, this significantly outperforms the worst-case guarantees, as well as existing practical methods. Figure 1: Per time-step rewards for increasing lengths of episodes with history-dependent policies returned after the exploration phase
Researcher Affiliation Collaboration Jeongyeol Kwon University of Wisconsin-Madison jeongyeol.kwon@wisc.edu Yonathan Efroni Meta, New York jonathan.efroni@gmail.com Constantine Caramanis The University of Texas at Austin constantine@utexas.edu Shie Mannor Technion, NVIDIA shie@ee.technion.ac.il, smannor@nvidia.com
Pseudocode Yes Algorithm 1
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets No The paper describes a theoretical framework and experiments which appear to be conducted in a simulated environment based on Gaussian reward distributions, but it does not specify a named, publicly available dataset with concrete access information (e.g., link, DOI, or citation) used for training.
Dataset Splits No The paper states training details are in the supplementary materials, but the main text does not specify dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification No Did you include the amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]
Software Dependencies No The paper does not provide specific software names with version numbers for reproducibility in the main text.
Experiment Setup No Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] : in Supplementary Materials. The main text does not contain specific hyperparameters or training configurations.