Off-Policy Confidence Sequences
Authors: Nikos Karampatziakis, Paul Mineiro, Aaditya Ramdas
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7. ExperimentsCode to reproduce all experiment results is available at https://github.com/n17s/mope |
| Researcher Affiliation | Collaboration | 1Microsoft Azure AI 2Microsoft Research 3Carnegie Mellon University. |
| Pseudocode | Yes | Algorithm 1 Solve λ = argmaxλ C ψλ Aλ + λ b ... Algorithm 2 MOPE: Martingale Off-Policy Evaluation |
| Open Source Code | Yes | Code to reproduce all experiment results is available at https://github.com/n17s/mope |
| Open Datasets | Yes | We use the first 1 million samples from the mnist8m dataset |
| Dataset Splits | No | The paper describes using data from the 'mnist8m dataset' and processing it, but it does not explicitly specify dataset splits (e.g., percentages or counts for training, validation, or testing sets). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions training functions like 'linear multinomial logistic regression (MLR)' but does not specify any software libraries or their version numbers that were used. |
| Experiment Setup | Yes | We use the first 1 million samples from the mnist8m dataset which has 10 classes and train the following functions: h using linear multinomial logistic regression (MLR), π again using MLR but now on 1000 random Fourier features (RFF) (Rahimi and Recht, 2007) that approximate a Gaussian kernel machine, and finally q which uses the same RFF represetation as π but instead its i-th output is independently trained to predict whether the input is the i-th class using 10 binary logistic regressions. We used the rest of the data with the following protocol: for each input/label pair (xi, yi), we sample action ai with probability 0.9h(ai; xi) + 0.01 (so that we can safely set wmax = 100), we set ri = 1 if ai = yi, otherwise ri = 0, and record wi and ci. |