Learning Bellman Complete Representations for Offline Policy Evaluation

Authors: Jonathan Chang, Kaiwen Wang, Nathan Kallus, Wen Sun

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite.
Researcher Affiliation Academia 1Computer Science, Cornell University, Ithaca, NY, USA 2Operations Research and Information Engineering, Cornell Tech, New York, NY, USA.
Pseudocode Yes Algorithm 1 Least Squares Policy Evaluation (LSPE), Algorithm 2 OPE with Bellman Complete and exploratory Representation Learning (BCRL), Algorithm 3 Practical Instantiation of BCRL
Open Source Code Yes Code available at https://github.com/CausalML/bcrl.
Open Datasets Yes Deep Mind Control Suite benchmark (Tassa et al., 2018)
Dataset Splits No The paper states 'Randomly split D into two sets D1, D2 of size N' in Algorithm 2, but does not provide explicit training, validation, and test splits or percentages for overall experiment reproduction, nor does it refer to predefined standard splits with citations.
Hardware Specification No The paper does not specify the hardware used for experiments, such as CPU/GPU models or memory.
Software Dependencies No The paper mentions using 'Dr Q-v2' and 'SAC-AE' implementations and other libraries, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Table 3. Hyperparameters used for BCRL: Feature Dimension 512 Weight Initialization orthogonal init. Optimizer Adam Learning Rate 1 10 5 Batch Size 2048 Training Epochs 200 τ (target) 0.005 λDesign 5 10 6