Maximum Causal Tsallis Entropy Imitation Learning
Authors: Kyungjae Lee, Sungjoon Choi, Songhwai Oh
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of the proposed method, we conduct two simulation studies. In the first simulation study, we verify that MCTEIL with a sparse MDN can successfully learn multimodal behaviors from expert s demonstrations. The second simulation study is conducted using four continuous control problems in Mu Jo Co [10]. MCTEIL outperforms existing methods in terms of the average cumulative return. |
| Researcher Affiliation | Collaboration | Kyungjae Lee1, Sungjoon Choi2, and Songhwai Oh1 Dep. of Electrical and Computer Engineering and ASRI, Seoul National University1 Kakao Brain2 |
| Pseudocode | Yes | Algorithm 1 Maximum Causal Tsallis Entropy Imitation Learning |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the availability of the source code for the described methodology. |
| Open Datasets | Yes | The second simulation study is conducted using four continuous control problems in Mu Jo Co [10]. [10] E. Todorov, T. Erez, and Y. Tassa, Mu Jo Co: A physics engine for model-based control, in Proceedings of the International Conference on Intelligent Robots and Systems, October 2012, pp. 5026 5033. |
| Dataset Splits | No | The paper mentions generating demonstrations (e.g., '300 demonstrations from the expert’s policy', '50 demonstrations from the expert policy') and using varying numbers of demonstrations for training. However, it does not specify explicit training/validation/test splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Mu Jo Co' as a physics engine, but it does not specify any software versions for libraries, frameworks, or programming languages (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For tested methods, 500 episodes are sampled at each iteration. We first train the optimal policy using [3] and generate 300 demonstrations from the expert s policy. We run algorithms with varying numbers of demonstrations, 4, 11, 18, and 25, and all experiments have been repeated three times with different random seeds. For methods using an MDN, we use the best number of mixtures using a brute force search. |