reproducibilityindex.ai

Deep Learning with Logged Bandit Feedback

Authors: Thorsten Joachims, Adith Swaminathan, Maarten de Rijke

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the effectiveness of the method by showing how deep networks Res Nets in particular can be trained for object recognition without conventionally labeled images. The empirical evaluation is designed to address three key questions. First, it veriﬁes that deep models can indeed be trained effectively using our approach. Second, we will compare how the same deep neural network architecture performs under different types of data and training objectives in particular, conventional cross-entropy training using full-information data. Third, we explore the effectiveness and ﬁdelity of the approximate SNIPS objective. For the following Bandit Net experiments, we adapted the Res Net20 architecture (He et al., 2016) by replacing the conventional cross-entropy objective with our counterfactual risk minimization objective. We evaluate the performance of this Bandit-Res Net on the CIFAR-10 (Krizhevsky & Hinton, 2009) dataset...
Researcher Affiliation	Collaboration	Thorsten Joachims Cornell University tj@cs.cornell.edu Adith Swaminathan Microsoft Research adswamin@microsoft.com Maarten de Rijke University of Amsterdam derijke@uva.nl
Pseudocode	No	The paper describes the algorithm mathematically and in text but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	To easily enable experimentation on other applications, we share an implementation of Bandit Net.1
Open Datasets	Yes	We evaluate the performance of this Bandit-Res Net on the CIFAR-10 (Krizhevsky & Hinton, 2009) dataset...
Dataset Splits	No	The paper mentions: 'Since CIFAR10 does not come with a validation set for tuning the variance-regularization constant γ, we do not use variance regularization for Bandit-Res Net.' and 'The Lagrange multiplier λ {0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05} is selected on the training set via Eq. (13)'. No explicit validation data split percentages or counts are provided.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments.
Software Dependencies	No	The paper mentions 'CNTK implementation of Res Net20' but does not specify version numbers for CNTK or other software dependencies.
Experiment Setup	Yes	Both the conventional full-information Res Net as well as the Bandit-Res Net use the same network architecture, the same hyperparameters, the same data augmentation scheme, and the same optimization method that were set in the CNTK implementation of Res Net20... The only parameter we adjusted for Bandit-Res Net is lowering the learning rate to 0.1 and slowing down the learning rate schedule... we report test performance after 1000 training epochs... The Lagrange multiplier λ {0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05} is selected on the training set via Eq. (13).