Learning Bellman Complete Representations for Offline Policy Evaluation
Authors: Jonathan Chang, Kaiwen Wang, Nathan Kallus, Wen Sun
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite. |
| Researcher Affiliation | Academia | 1Computer Science, Cornell University, Ithaca, NY, USA 2Operations Research and Information Engineering, Cornell Tech, New York, NY, USA. |
| Pseudocode | Yes | Algorithm 1 Least Squares Policy Evaluation (LSPE), Algorithm 2 OPE with Bellman Complete and exploratory Representation Learning (BCRL), Algorithm 3 Practical Instantiation of BCRL |
| Open Source Code | Yes | Code available at https://github.com/CausalML/bcrl. |
| Open Datasets | Yes | Deep Mind Control Suite benchmark (Tassa et al., 2018) |
| Dataset Splits | No | The paper states 'Randomly split D into two sets D1, D2 of size N' in Algorithm 2, but does not provide explicit training, validation, and test splits or percentages for overall experiment reproduction, nor does it refer to predefined standard splits with citations. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions using 'Dr Q-v2' and 'SAC-AE' implementations and other libraries, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Table 3. Hyperparameters used for BCRL: Feature Dimension 512 Weight Initialization orthogonal init. Optimizer Adam Learning Rate 1 10 5 Batch Size 2048 Training Epochs 200 τ (target) 0.005 λDesign 5 10 6 |