reproducibilityindex.ai

Contextual Information-Directed Sampling

Authors: Botao Hao, Tor Lattimore, Chao Qin

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further propose a computationally-efficient version of contextual IDS based on Actor Critic and evaluate it empirically on a neural network contextual bandit.
Researcher Affiliation	Collaboration	1Deepmind 2Columbia University.
Pseudocode	No	The paper describes algorithms and derivations in prose and mathematical notation but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	At each round t, the environment independently generates an observation in the form of d-dimensional contextual vector xt from some distributions. [...] The contextual vector xt R10 is sampled from N(0, I10) and the noise is sampled from standard Gaussian distribution.
Dataset Splits	No	The experiment is conducted in a simulated neural network contextual bandit environment, which continuously generates data. Therefore, traditional train/validation/test dataset splits are not applicable or specified.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	We set the generative model fθ being a 2-hidden-layer Re LU MLP with 10 hidden neurons. [...] Both the policy network and value network are using 2-hidden-layer Re LU MLP with 20 hidden neurons and optimized by Adam with learning rate 0.001.
Experiment Setup	Yes	We set the generative model fθ being a 2-hidden-layer Re LU MLP with 10 hidden neurons. The number of actions is 5. The contextual vector xt R10 is sampled from N(0, I10) and the noise is sampled from standard Gaussian distribution. [...] we use 10 ensembles in our experiment. With 200 posterior samples, we use the same way described by Lu et al. (2021) to approximate the one-step regret and information gain for both conditional IDS and contextual IDS. We sample 20 independent contextual vectors at each round. Both the policy network and value network are using 2-hidden-layer Re LU MLP with 20 hidden neurons and optimized by Adam with learning rate 0.001.