Imitation with Neural Density Models
Authors: Kuno Kim, Akshat Jindal, Yang Song, Jiaming Song, Yanan Sui, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks. and 6 Experiments: Environment: Following prior work, we run experiments on benchmark Mujoco (Brockman et al., 2016; Todorov et al., 2012) tasks... |
| Researcher Affiliation | Collaboration | Kuno Kim1, Akshat Jindal1, Yang Song1, Jiaming Song1, Yanan Sui2, Stefano Ermon1 1Department of Computer Science, Stanford University 2NELN, School of Aerospace Engineering, Tsinghua University |
| Pseudocode | Yes | Algorithm 1: Neural Density Imitation (NDI) |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Environment: Following prior work, we run experiments on benchmark Mujoco (Brockman et al., 2016; Todorov et al., 2012) tasks: Hopper (11, 3), Half Cheetah (17, 6), Walker (17, 6), Ant (111, 8), and Humanoid (376, 17), where the (observation, action) dimensions are noted parentheses. |
| Dataset Splits | No | The paper mentions using Mujoco tasks and sampling trajectories, but does not provide specific details on how the dataset was split into training, validation, or test sets, nor does it reference predefined splits with citations for this purpose. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware specifications (e.g., exact GPU/CPU models, memory, or cloud computing resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Soft Actor-Critic (SAC) but does not provide specific version numbers for any software dependencies or libraries needed to replicate the experiments. |
| Experiment Setup | Yes | Across all experiments, our density model qφ is a two-layer MLP with 256 hidden units. For hyperparameters related to the Max Occ Ent RL step, λ = 0.2 is fixed and for λf see Section 6.3. and We train expert policies using SAC (Haarnoja et al., 2018). All of our results are averaged across five random seeds where for each seed we randomly sample a trajectory from an expert, perform density estimation, and then Max Occ Ent RL. and In this work, we use Soft Actor-Critic (SAC) (Haarnoja et al., 2018). and To estimate the expectations with respect to qt, qt+1 in Eq. 8, we simply take samples of previously visited states at time t, t + 1 from the replay buffer. |