Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Authors: Edoardo Cetin, Oya Celiktutan

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our algorithm is able to efficiently imitate in a diverse range of control problems including balancing, manipulation and locomotive tasks, while being robust to various domain differences in terms of both environment appearance and agent embodiment.
Researcher Affiliation Academia Edoardo Cetin & Oya Celiktutan Centre for Robotics Research Department of Engineering, King s College London {edoardo.cetin,oya.celiktutan}@kcl.ac.uk
Pseudocode Yes A formal summary of Disentan GAIL is reported below in Algorithm 1.
Open Source Code Yes To facilitate future efforts, we share the code for our algorithms and environments: https://github.com/Aladoro/domain-robust-visual-il.
Open Datasets Yes To evaluate our algorithm, we design six different environment realms, simulated with Mujoco (Todorov et al., 2012), extending the environments from Brockman et al. (2016): Inverted Pendulum, Reacher, Hopper, Half-Cheetah, 7DOF-Pusher and 7DOF-Striker. ... We refer to the environments in these realms as high dimensional since their state and action spaces are significantly larger than the state and action spaces of the environments explored in prior work making use of the domain confusion loss (Stadie et al., 2017; Okumura et al., 2020; Choi et al., 2020).
Dataset Splits No The paper does not specify traditional training, validation, and test dataset splits with percentages or absolute counts. It mentions using 'BE' (expert demonstrations) and 'Bπ' (agent observations, acting as a replay buffer) for learning, but no distinct validation set.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software like Mujoco, OpenAI Gym, Adam optimizer, and Soft-Actor Critic (SAC) algorithm but does not specify their version numbers, which are critical for reproducibility.
Experiment Setup Yes We provide the utilized environment-specific hyper-parameters in Table 4, where we specify the buffer sizes in terms of total/maximum number of observations. ... for all optimizations, we set the batch size |b| = 128. ... we utilize the same 2 hidden-layer fully-connected policy and Q-networks with 256 units and Re LU nonlinearities. ... We train each model through the Adam optimizer (Kingma & Ba, 2014) with a unique learning rate α = 0.001 and momentum parameters β1 = 0.9, β2 = 0.999.