Principled Offline RL in the Presence of Rich Exogenous Information
Authors: Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Rajiv Didolkar, Dipendra Misra, Xin Li, Harm Van Seijen, Remi Tachet Des Combes, John Langford
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section provides extensive analysis of representation learning from visual offline data under rich exogenous information (Figure 3). Our experiments aim to understand the effect of exogenous information and if ACRO can truly learn the agent-centric state and thus improve performance in visual offline RL. To this end, we evaluate ACRO against several state of the art representation learning baselines across two axes of added exogenous information: Temporal Correlation and Diversity, hence characterizing the level of difficulty systematically. |
| Researcher Affiliation | Collaboration | Riashat Islam * 1 2 3 Manan Tomar * 4 2 Alex Lamb 3 Yonathan Efroni 5 Hongyu Zang 6 Aniket Didolkar 7 3 Dipendra Misra 3 Xin Li 6 Harm Van Seijen 2 Remi Tachet Des Combes 2 John Langford 3 1Mc Gill University, Quebec AI Institute 2Microsoft Research, Montreal 3Microsoft Research, New York 4University of Alberta 5Meta, New York 6Beijing Institute of Technology, Beijing 7University of Montreal, Quebec AI Institute. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper mentions building off an open source code base for a benchmark, but does not provide concrete access or an explicit statement about releasing the source code for ACRO itself. 'For our experiments, we build off from the open source code base accompanying the v-d4rl benchmark (Lu et al., 2022b).' |
| Open Datasets | Yes | We provide details of each EXOGENOUS DATASETS in Appendix E.1.1, along with descriptions for the data collection process in Appendix E.2. Following Fu et al. (2020); Lu et al. (2022a), we release these datasets for future use by the RL community. |
| Dataset Splits | No | The paper mentions using TD3+BC, an algorithm that typically uses validation, but does not provide specific details on the dataset splits (percentages or counts) used for validation in their experiments. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types with specifications) used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions software components like 'TD3 + BC' and building off 'v-d4rl benchmark' but does not specify version numbers for these or any other ancillary software libraries or programming languages. |
| Experiment Setup | Yes | We pre-train the representations for 100K pre-training steps. Given pixel based visual offline data, we use a simple CNN+MLP architecture for encoding obeservations and predicting the ACRO actions. Specifically, the ACRO encoder uses 4-layers of convolutions, each with a kernel size of 3 and 32 channels. The original observation is of 84 84 9, corresponding to a 3 channel-observation and a frame stacking of 3. The final encoder layer is an MLP which maps the convolutional output to a representation dimension of 256, giving the output ϕ(x). This is followed by a 2-layer MLP (hidden dim-256) that is used to predict the action given a 512 input corresponding to a concatenated st and st+k representations. For ACRO, we sample k from 1 to 15 uniformly. We use Re LU non-linearity and ADAM for optimization all throughout. |