Off-Policy Confidence Sequences

Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7. ExperimentsCode to reproduce all experiment results is available at https://github.com/n17s/mope
Researcher Affiliation Collaboration 1Microsoft Azure AI 2Microsoft Research 3Carnegie Mellon University.
Pseudocode Yes Algorithm 1 Solve λ = argmaxλ C ψλ Aλ + λ b ... Algorithm 2 MOPE: Martingale Off-Policy Evaluation
Open Source Code Yes Code to reproduce all experiment results is available at https://github.com/n17s/mope
Open Datasets Yes We use the first 1 million samples from the mnist8m dataset
Dataset Splits No The paper describes using data from the 'mnist8m dataset' and processing it, but it does not explicitly specify dataset splits (e.g., percentages or counts for training, validation, or testing sets).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions training functions like 'linear multinomial logistic regression (MLR)' but does not specify any software libraries or their version numbers that were used.
Experiment Setup Yes We use the first 1 million samples from the mnist8m dataset which has 10 classes and train the following functions: h using linear multinomial logistic regression (MLR), π again using MLR but now on 1000 random Fourier features (RFF) (Rahimi and Recht, 2007) that approximate a Gaussian kernel machine, and finally q which uses the same RFF represetation as π but instead its i-th output is independently trained to predict whether the input is the i-th class using 10 binary logistic regressions. We used the rest of the data with the following protocol: for each input/label pair (xi, yi), we sample action ai with probability 0.9h(ai; xi) + 0.01 (so that we can safely set wmax = 100), we set ri = 1 if ai = yi, otherwise ri = 0, and record wi and ci.