An Enhanced Advising Model in Teacher-Student Framework using State Categorization
Authors: Daksh Anand, Vaibhav Gupta, Praveen Paruchuri, Balaraman Ravindran6653-6660
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the robustness of our approach by showcasing our experiments on multiple Atari 2600 games using a fixed set of hyper-parameters. |
| Researcher Affiliation | Academia | 1 Machine Learning Lab, IIIT Hyderabad 2 Robert Bosch Center for Data Science and AI, IIT Madras |
| Pseudocode | Yes | The complete algorithm is given in algorithm 1. The routine get Action(s) returns the action to be taken by the student in the state s. |
| Open Source Code | No | No explicit statement or link for open-source code release is provided in the paper. |
| Open Datasets | Yes | We demonstrate the performance of our approach on three domains from the Arcade Learning Environment (Bellemare et al. 2013), namely Qbert, Boxing and Seaquest. |
| Dataset Splits | No | The paper describes training and testing epochs for evaluating performance but does not specify explicit training/validation/test dataset splits as commonly found in supervised learning. |
| Hardware Specification | No | No specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions algorithms and architectures like Double-DQN and DQN, but does not specify software packages with version numbers (e.g., PyTorch, TensorFlow, or specific libraries). |
| Experiment Setup | Yes | The values of advice ratio α and the batch size were fixed to 0.01 and 8 respectively for this experiment. For all the games, we fix γ to 0.99. All the agents were trained for 30 million steps with the size of each training epoch being 40k steps. |