Deep Learning with Logged Bandit Feedback

Authors: Thorsten Joachims, Adith Swaminathan, Maarten de Rijke

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the effectiveness of the method by showing how deep networks Res Nets in particular can be trained for object recognition without conventionally labeled images. The empirical evaluation is designed to address three key questions. First, it verifies that deep models can indeed be trained effectively using our approach. Second, we will compare how the same deep neural network architecture performs under different types of data and training objectives in particular, conventional cross-entropy training using full-information data. Third, we explore the effectiveness and fidelity of the approximate SNIPS objective. For the following Bandit Net experiments, we adapted the Res Net20 architecture (He et al., 2016) by replacing the conventional cross-entropy objective with our counterfactual risk minimization objective. We evaluate the performance of this Bandit-Res Net on the CIFAR-10 (Krizhevsky & Hinton, 2009) dataset...
Researcher Affiliation Collaboration Thorsten Joachims Cornell University tj@cs.cornell.edu Adith Swaminathan Microsoft Research adswamin@microsoft.com Maarten de Rijke University of Amsterdam derijke@uva.nl
Pseudocode No The paper describes the algorithm mathematically and in text but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes To easily enable experimentation on other applications, we share an implementation of Bandit Net.1
Open Datasets Yes We evaluate the performance of this Bandit-Res Net on the CIFAR-10 (Krizhevsky & Hinton, 2009) dataset...
Dataset Splits No The paper mentions: 'Since CIFAR10 does not come with a validation set for tuning the variance-regularization constant γ, we do not use variance regularization for Bandit-Res Net.' and 'The Lagrange multiplier λ {0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05} is selected on the training set via Eq. (13)'. No explicit validation data split percentages or counts are provided.
Hardware Specification No The paper does not provide specific details on the hardware used for experiments.
Software Dependencies No The paper mentions 'CNTK implementation of Res Net20' but does not specify version numbers for CNTK or other software dependencies.
Experiment Setup Yes Both the conventional full-information Res Net as well as the Bandit-Res Net use the same network architecture, the same hyperparameters, the same data augmentation scheme, and the same optimization method that were set in the CNTK implementation of Res Net20... The only parameter we adjusted for Bandit-Res Net is lowering the learning rate to 0.1 and slowing down the learning rate schedule... we report test performance after 1000 training epochs... The Lagrange multiplier λ {0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05} is selected on the training set via Eq. (13).