A teacher-student framework to distill future trajectories

Authors: Alexander Neitz, Giambattista Parascandolo, Bernhard Schölkopf

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach performs well on tasks that are difficult for model-free and model-based methods, and we study the role of every component through ablation studies.
Researcher Affiliation Academia 1MPI for Intelligent Systems, Tübingen, 2ETH, Zürich, equal contribution
Pseudocode Yes Algorithm 1 Teacher update
Open Source Code No The paper does not provide an explicit statement or a link to its own open-source code for the described methodology.
Open Datasets No The datasets for the Mu Jo Co reward prediction task are generated as follows. We collect a dataset of size 4000 for each of the environments Swimmer-v2, Walker2d-v2, Hopper-v2, and Half Cheetah-v2. The paper mentions using public environments (MuJoCo, OpenAI Gym, Game of Life) to *generate* their datasets, but does not provide access information (link, DOI, specific citation) for the *specific datasets generated and used in the experiments*.
Dataset Splits Yes At the beginning of training, we split the dataset into a training and a validation set. This split is kept during the entire duration of training. The split ratio is a hyperparameter. Validation split {0.3, 0.5, 0.7}
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No We implemented LDT in Py Torch (Paszke et al., 2019) using higher by Grefenstette et al. (2019). While PyTorch and higher are mentioned with their respective publication years, specific version numbers (e.g., PyTorch 1.x) are not provided, which are required for full reproducibility.
Experiment Setup Yes Hyperparameters for each method are optimized independently (see Appendix for ranges) for each method and task. The paper includes detailed tables of hyperparameters (e.g., Table 1, 2, 3, 4, 5, 6) in the appendix.