Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Authors: Mhairi Dunion, Trevor McInroe, Kevin Luck, Josiah Hanna, Stefano Albrecht

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.
Researcher Affiliation Academia Mhairi Dunion University of Edinburgh mhairi.dunion@ed.ac.uk Trevor Mc Inroe University of Edinburgh t.mcinroe@ed.ac.uk Kevin Sebastian Luck Vrije Universiteit Amsterdam k.s.luck@vu.nl Josiah P. Hanna University of Wisconsin Madison jphanna@cs.wisc.edu Stefano V. Albrecht University of Edinburgh s.albrecht@ed.ac.uk
Pseudocode Yes The architecture for CMID is shown in Figure 2, and the pseudocode is provided in Algorithm 1.
Open Source Code Yes A public and open-source implementation of CMID is available at github.com/uoe-agents/cmid.
Open Datasets Yes We evaluate our approach on continuous control tasks with image observations from the Deep Mind Control Suite (DMC) (Tunyasuvunakool et al., 2020) where we add correlations between object colour and properties impacting dynamics (e.g. joint positions).
Dataset Splits No The paper mentions training and testing but does not explicitly describe a separate validation split for hyperparameter tuning.
Hardware Specification Yes For each experiment run we use a single NVIDIA Volta V100 GPU with 32GB memory and a single CPU.
Software Dependencies No The paper mentions PyTorch and the Captum library but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Table 2: Hyperparameter values for both SVEA and SVEA-CMID provides detailed settings like Replay buffer capacity 100000, Batch size 128, Discount factor 0.99, Optimizer Adam, Learning rate (actor, critic and encoder) 1e-3, etc.