reproducibilityindex.ai

Coherent Soft Imitation Learning

Authors: Joe Watson, Sandy Huang, Nicolas Heess

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experimental results We evaluate CSIL against baseline methods on tabular and continuous state-action environments. The baselines are popular entropy-regularized imitation learning methods discussed in Section 2. Moreover, ablation studies are provided in Appendix N for the experiments in Section 5.2 and 5.3
Researcher Affiliation	Collaboration	Joe Watson Sandy H. Huang Nicolas Heess Google Deep Mind London, United Kingdom {shhuang,heess}@google.com TU Darmstadt Darmstadt, Germany joe@robot-learning.de Systems AI for Robot Learning German Research Center for AI dfki.de
Pseudocode	Yes	Algorithm 1: Coherent soft imitation learning (CSIL)
Open Source Code	Yes	For the open-source implementation and simulation results, see joemwatson.github.io/csil.
Open Datasets	Yes	A standard benchmark of deep imitation learning is learning Mu Jo Co [72] Gym [73] and Adroit [74] tasks from agent demonstrations.
Dataset Splits	No	The paper describes using various datasets (e.g., Gym, Adroit, robomimic) and discusses combining demonstration data with online/offline data sources. However, it does not provide specific numerical details (percentages or counts) for train, validation, or test dataset splits.
Hardware Specification	Yes	Our learner (policy evaluation and improvement) runs on a single TPU v2. We ran four actors to interact with the environment. Depending on the algorithm, there were also one or more evaluators. For vision-based tasks, we used A100 GPUs for the vision-based policies.
Software Dependencies	No	The paper mentions using 'jax automatic differentiation and linear algebra library', 'acme', and implementations based on 'PyTorch' (in references), but it does not specify concrete version numbers for these software components to ensure reproducibility (e.g., JAX version X.Y, Acme version A.B).
Experiment Setup	Yes	The policy and critic networks were comprised of two layers with 256 units and ELU activations. Learning rates were 3e-4, the batch size was 256, and the target network smoothing coefficient was 0.005.