A Coupled Flow Approach to Imitation Learning

Authors: Gideon Joseph Freund, Elad Sarafian, Sarit Kraus

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CFIL on the standard Mujoco benchmarks (Todorov et al., 2012), first comparing it to state-of-the-art imitation methods, including Value DICE (Kostrikov et al., 2019) and their optimized implementation of DAC (Kostrikov et al., 2018), along with a customary behavioral cloning (BC) baseline.
Researcher Affiliation Academia 1Department of Computer Science, Bar-Ilan University, Israel. Correspondence to: Gideon Freund <gideonfreund@gmail.com>.
Pseudocode Yes Our resulting algorithm, Coupled Flow Imitation Learning (CFIL). It is summarized in Algorithm 1
Open Source Code Yes Code for reproducibility of CFIL, including a detailed description for reproducing our environment, is available at https: //github.com/gfreund123/cfil.
Open Datasets Yes We use Value DICE s original expert demonstrations, with exception to the Humanoid environment, for which we train our own expert, since they did not originally evaluate on it. We use Value DICE s open-source implementation to comfortably run all three baselines. NDI (Kim et al., 2021b) would be the ideal candidate for comparison, given the similarities, however no code was available.
Dataset Splits No The paper specifies training details and evaluation metrics (e.g., "evaluating over 10 episodes after each") but does not explicitly mention distinct training/validation/test splits with percentages or counts for a dataset, typical in supervised learning. For RL, evaluation episodes on the environment serve a similar purpose to testing, but a dedicated validation split is not specified.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions software like "Spinning Ups s (Achiam, 2018) SAC (Haarnoja et al., 2018)" and the "Adam optimizer (Kingma & Ba, 2014)" but does not provide specific version numbers for these libraries or frameworks. It also refers to an "open-source implementation (Bliznashki, 2019)" for MAF, but this is a citation, not a version number for the software dependency itself.
Experiment Setup Yes Our density update rate is 10 batches of 100, every 1000 timesteps. We use the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001. For squashing we use σ = 6tanh( x/15), while the smoothing and regularization coefficients are 0.5 and 1 respectively. For all algorithms, we run 80 epochs, each consisting of 4000 timesteps, evaluating over 10 episodes after each. We do this across 5 random seeds and plot means and standard deviations.