An Enhanced Advising Model in Teacher-Student Framework using State Categorization

Authors: Daksh Anand, Vaibhav Gupta, Praveen Paruchuri, Balaraman Ravindran6653-6660

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the robustness of our approach by showcasing our experiments on multiple Atari 2600 games using a fixed set of hyper-parameters.
Researcher Affiliation Academia 1 Machine Learning Lab, IIIT Hyderabad 2 Robert Bosch Center for Data Science and AI, IIT Madras
Pseudocode Yes The complete algorithm is given in algorithm 1. The routine get Action(s) returns the action to be taken by the student in the state s.
Open Source Code No No explicit statement or link for open-source code release is provided in the paper.
Open Datasets Yes We demonstrate the performance of our approach on three domains from the Arcade Learning Environment (Bellemare et al. 2013), namely Qbert, Boxing and Seaquest.
Dataset Splits No The paper describes training and testing epochs for evaluating performance but does not specify explicit training/validation/test dataset splits as commonly found in supervised learning.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, or cloud instance types) used for experiments are provided in the paper.
Software Dependencies No The paper mentions algorithms and architectures like Double-DQN and DQN, but does not specify software packages with version numbers (e.g., PyTorch, TensorFlow, or specific libraries).
Experiment Setup Yes The values of advice ratio α and the batch size were fixed to 0.01 and 8 respectively for this experiment. For all the games, we fix γ to 0.99. All the agents were trained for 30 million steps with the size of each training epoch being 40k steps.