TGRL: An Algorithm for Teacher Guided Reinforcement Learning
Authors: Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, Pulkit Agrawal
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method, Teacher Guided Reinforcement Learning (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning. We perform four sets of experiments. |
| Researcher Affiliation | Academia | 1Improbable AI Lab, Massachusetts Institute of Technology, Cambridge, USA 2Technion Israel Institute of Technology, Haifa, Israel. Correspondence to: Idan Shenfled <idanshen@mit.edu>. |
| Pseudocode | Yes | Algorithm 1 Teacher Guided Reinforcement Learning (TGRL) |
| Open Source Code | No | Our implementation is based on the code released by (Ni et al., 2022). The paper does not explicitly state that its own source code is available or provide a link. |
| Open Datasets | Yes | Tiger Door. A robot must navigate to the goal cell (green), without touching the failure cell (blue)... (Littman et al., 1995; Warrington et al., 2021). Light-Dark Ant. A Mujoco Ant environment... This environment is inspired by a popular POMDP benchmark (Platt Jr et al., 2010). The experiment includes three classic POMDP environments from (Ni et al., 2022). These environments are a version of the Mujoco Hopper, Walker2D, and Half Cheetah environments... |
| Dataset Splits | No | The paper does not explicitly provide training, validation, or test dataset splits with specific percentages, sample counts, or a detailed splitting methodology. |
| Hardware Specification | No | The paper mentions 'MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources' but does not specify any exact hardware details like GPU/CPU models or specific processor types. |
| Software Dependencies | No | The paper mentions using DQN and SAC algorithms, and that 'Our implementation is based on the code released by (Ni et al., 2022),' but it does not specify version numbers for any software dependencies like programming languages or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | Figure 7: Hyperparameters table. Optimizer Adam, Learning rate 3e-4, Discount factor (γ) 0.9, Batch size 32 128 128, LSTM hidden size 128 256 128, Obs. embedding 16 32 128, Actions embedding 16 32 16, Hidden layers after LSTM [128,128] [512,256] [512, 256, 128]. |