A teacher-student framework to distill future trajectories
Authors: Alexander Neitz, Giambattista Parascandolo, Bernhard Schölkopf
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach performs well on tasks that are difficult for model-free and model-based methods, and we study the role of every component through ablation studies. |
| Researcher Affiliation | Academia | 1MPI for Intelligent Systems, Tübingen, 2ETH, Zürich, equal contribution |
| Pseudocode | Yes | Algorithm 1 Teacher update |
| Open Source Code | No | The paper does not provide an explicit statement or a link to its own open-source code for the described methodology. |
| Open Datasets | No | The datasets for the Mu Jo Co reward prediction task are generated as follows. We collect a dataset of size 4000 for each of the environments Swimmer-v2, Walker2d-v2, Hopper-v2, and Half Cheetah-v2. The paper mentions using public environments (MuJoCo, OpenAI Gym, Game of Life) to *generate* their datasets, but does not provide access information (link, DOI, specific citation) for the *specific datasets generated and used in the experiments*. |
| Dataset Splits | Yes | At the beginning of training, we split the dataset into a training and a validation set. This split is kept during the entire duration of training. The split ratio is a hyperparameter. Validation split {0.3, 0.5, 0.7} |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | We implemented LDT in Py Torch (Paszke et al., 2019) using higher by Grefenstette et al. (2019). While PyTorch and higher are mentioned with their respective publication years, specific version numbers (e.g., PyTorch 1.x) are not provided, which are required for full reproducibility. |
| Experiment Setup | Yes | Hyperparameters for each method are optimized independently (see Appendix for ranges) for each method and task. The paper includes detailed tables of hyperparameters (e.g., Table 1, 2, 3, 4, 5, 6) in the appendix. |