reproducibilityindex.ai

Off-Policy Confidence Sequences

Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. ExperimentsCode to reproduce all experiment results is available at https://github.com/n17s/mope
Researcher Affiliation	Collaboration	1Microsoft Azure AI 2Microsoft Research 3Carnegie Mellon University.
Pseudocode	Yes	Algorithm 1 Solve λ = argmaxλ C ψλ Aλ + λ b ... Algorithm 2 MOPE: Martingale Off-Policy Evaluation
Open Source Code	Yes	Code to reproduce all experiment results is available at https://github.com/n17s/mope
Open Datasets	Yes	We use the ﬁrst 1 million samples from the mnist8m dataset
Dataset Splits	No	The paper describes using data from the 'mnist8m dataset' and processing it, but it does not explicitly specify dataset splits (e.g., percentages or counts for training, validation, or testing sets).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions training functions like 'linear multinomial logistic regression (MLR)' but does not specify any software libraries or their version numbers that were used.
Experiment Setup	Yes	We use the ﬁrst 1 million samples from the mnist8m dataset which has 10 classes and train the following functions: h using linear multinomial logistic regression (MLR), π again using MLR but now on 1000 random Fourier features (RFF) (Rahimi and Recht, 2007) that approximate a Gaussian kernel machine, and ﬁnally q which uses the same RFF represetation as π but instead its i-th output is independently trained to predict whether the input is the i-th class using 10 binary logistic regressions. We used the rest of the data with the following protocol: for each input/label pair (xi, yi), we sample action ai with probability 0.9h(ai; xi) + 0.01 (so that we can safely set wmax = 100), we set ri = 1 if ai = yi, otherwise ri = 0, and record wi and ci.