HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge

Authors: Yanhai Xiong, Haipeng Chen, Mengchen Zhao, Bo An

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate Hog Rider. We demonstrate that by considering uncertainty in agent action observation, Hog Rider outperforms existing agent type belief update methods. We also show improved performance of Hog Rider compared with traditional Q-learning methods. Challenge results show that Hog Rider outperforms all the other teams by a significant edge, in terms of both optimality and stability.
Researcher Affiliation Academia Yanhai Xiong, Haipeng Chen, Mengchen Zhao, Bo An Nanyang Technological University {yxiong003,chen0939,zhao0204,boan}@ntu.edu.sg
Pseudocode Yes Algorithm 1: Training Hog Rider and Algorithm 2: Hog Rider are presented in the paper, providing structured pseudocode.
Open Source Code Yes Xiong, Y.; Chen, H.; Zhao, M.; and An, B. 2017. Hogrider appendix. http://AAAImalmo.weebly.com. (This link is referenced in footnote 3 which states 'while we refer to our online appendix (Xiong et al. 2017) for results of per episode scores, as well as figures showing the detailed curves of the results that cannot be presented with tables.')
Open Datasets No The paper uses the Microsoft Malmo Collaborative AI Challenge, which is a Minecraft mini-game environment, for training and evaluation. It describes the game setup and how agents interact to generate 'episodes' for learning, rather than using a static, publicly accessible dataset that can be downloaded or cited with specific access information.
Dataset Splits No The paper presents 'Learning episodes' and 'Validation curve' figures, indicating the use of a validation process during reinforcement learning. However, it does not specify a fixed train/validation/test split for a static dataset with explicit percentages or sample counts, as the data is generated through interaction with a simulated environment.
Hardware Specification No The paper states that 'The simulations are performed on the Minecraft platform', and notes that it 'usually takes several seconds for one episode of game', highlighting the time-consuming nature of training. However, it does not provide any specific details about the hardware (e.g., CPU type, GPU models, memory) used for running these simulations or training the agent.
Software Dependencies No The paper mentions several algorithms and concepts, such as 'Q-learning', 'A* search algorithm', 'Bayes theorem', and 'DQN', but it does not specify any software names with version numbers (e.g., 'Python 3.x', 'PyTorch 1.x') that were used for implementation.
Experiment Setup Yes In active-ε-greedy, we still select the optimal action based on the Q-function with the probability of 1 ε. Different from ε-greedy, we explore sub-optimal actions with probability ε, when the Q-value of that action is no less than certain percentage (hand tuned as 50% in practice) of the highest Q-value in the same state. ... The entire outer loop terminates after M (e.g., 3,000) episodes. ... where α is the learning rate.