A Coupled Flow Approach to Imitation Learning
Authors: Gideon Joseph Freund, Elad Sarafian, Sarit Kraus
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CFIL on the standard Mujoco benchmarks (Todorov et al., 2012), first comparing it to state-of-the-art imitation methods, including Value DICE (Kostrikov et al., 2019) and their optimized implementation of DAC (Kostrikov et al., 2018), along with a customary behavioral cloning (BC) baseline. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Bar-Ilan University, Israel. Correspondence to: Gideon Freund <gideonfreund@gmail.com>. |
| Pseudocode | Yes | Our resulting algorithm, Coupled Flow Imitation Learning (CFIL). It is summarized in Algorithm 1 |
| Open Source Code | Yes | Code for reproducibility of CFIL, including a detailed description for reproducing our environment, is available at https: //github.com/gfreund123/cfil. |
| Open Datasets | Yes | We use Value DICE s original expert demonstrations, with exception to the Humanoid environment, for which we train our own expert, since they did not originally evaluate on it. We use Value DICE s open-source implementation to comfortably run all three baselines. NDI (Kim et al., 2021b) would be the ideal candidate for comparison, given the similarities, however no code was available. |
| Dataset Splits | No | The paper specifies training details and evaluation metrics (e.g., "evaluating over 10 episodes after each") but does not explicitly mention distinct training/validation/test splits with percentages or counts for a dataset, typical in supervised learning. For RL, evaluation episodes on the environment serve a similar purpose to testing, but a dedicated validation split is not specified. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like "Spinning Ups s (Achiam, 2018) SAC (Haarnoja et al., 2018)" and the "Adam optimizer (Kingma & Ba, 2014)" but does not provide specific version numbers for these libraries or frameworks. It also refers to an "open-source implementation (Bliznashki, 2019)" for MAF, but this is a citation, not a version number for the software dependency itself. |
| Experiment Setup | Yes | Our density update rate is 10 batches of 100, every 1000 timesteps. We use the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001. For squashing we use σ = 6tanh( x/15), while the smoothing and regularization coefficients are 0.5 and 1 respectively. For all algorithms, we run 80 epochs, each consisting of 4000 timesteps, evaluating over 10 episodes after each. We do this across 5 random seeds and plot means and standard deviations. |