reproducibilityindex.ai

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Authors: Edoardo Cetin, Oya Celiktutan

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our algorithm is able to efﬁciently imitate in a diverse range of control problems including balancing, manipulation and locomotive tasks, while being robust to various domain differences in terms of both environment appearance and agent embodiment.
Researcher Affiliation	Academia	Edoardo Cetin & Oya Celiktutan Centre for Robotics Research Department of Engineering, King s College London {edoardo.cetin,oya.celiktutan}@kcl.ac.uk
Pseudocode	Yes	A formal summary of Disentan GAIL is reported below in Algorithm 1.
Open Source Code	Yes	To facilitate future efforts, we share the code for our algorithms and environments: https://github.com/Aladoro/domain-robust-visual-il.
Open Datasets	Yes	To evaluate our algorithm, we design six different environment realms, simulated with Mujoco (Todorov et al., 2012), extending the environments from Brockman et al. (2016): Inverted Pendulum, Reacher, Hopper, Half-Cheetah, 7DOF-Pusher and 7DOF-Striker. ... We refer to the environments in these realms as high dimensional since their state and action spaces are signiﬁcantly larger than the state and action spaces of the environments explored in prior work making use of the domain confusion loss (Stadie et al., 2017; Okumura et al., 2020; Choi et al., 2020).
Dataset Splits	No	The paper does not specify traditional training, validation, and test dataset splits with percentages or absolute counts. It mentions using 'BE' (expert demonstrations) and 'Bπ' (agent observations, acting as a replay buffer) for learning, but no distinct validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software like Mujoco, OpenAI Gym, Adam optimizer, and Soft-Actor Critic (SAC) algorithm but does not specify their version numbers, which are critical for reproducibility.
Experiment Setup	Yes	We provide the utilized environment-speciﬁc hyper-parameters in Table 4, where we specify the buffer sizes in terms of total/maximum number of observations. ... for all optimizations, we set the batch size \|b\| = 128. ... we utilize the same 2 hidden-layer fully-connected policy and Q-networks with 256 units and Re LU nonlinearities. ... We train each model through the Adam optimizer (Kingma & Ba, 2014) with a unique learning rate α = 0.001 and momentum parameters β1 = 0.9, β2 = 0.999.