Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
Authors: Mhairi Dunion, Trevor McInroe, Kevin Luck, Josiah Hanna, Stefano Albrecht
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features. |
| Researcher Affiliation | Academia | Mhairi Dunion University of Edinburgh mhairi.dunion@ed.ac.uk Trevor Mc Inroe University of Edinburgh t.mcinroe@ed.ac.uk Kevin Sebastian Luck Vrije Universiteit Amsterdam k.s.luck@vu.nl Josiah P. Hanna University of Wisconsin Madison jphanna@cs.wisc.edu Stefano V. Albrecht University of Edinburgh s.albrecht@ed.ac.uk |
| Pseudocode | Yes | The architecture for CMID is shown in Figure 2, and the pseudocode is provided in Algorithm 1. |
| Open Source Code | Yes | A public and open-source implementation of CMID is available at github.com/uoe-agents/cmid. |
| Open Datasets | Yes | We evaluate our approach on continuous control tasks with image observations from the Deep Mind Control Suite (DMC) (Tunyasuvunakool et al., 2020) where we add correlations between object colour and properties impacting dynamics (e.g. joint positions). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly describe a separate validation split for hyperparameter tuning. |
| Hardware Specification | Yes | For each experiment run we use a single NVIDIA Volta V100 GPU with 32GB memory and a single CPU. |
| Software Dependencies | No | The paper mentions PyTorch and the Captum library but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 2: Hyperparameter values for both SVEA and SVEA-CMID provides detailed settings like Replay buffer capacity 100000, Batch size 128, Discount factor 0.99, Optimizer Adam, Learning rate (actor, critic and encoder) 1e-3, etc. |