Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

Authors: Aijun Bai, Stuart Russell

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the benchmark Taxi domain [Dietterich, 1999] and a much more complex Robo Cup Keepaway domain [Stone et al., 2005].
Researcher Affiliation Academia Aijun Bai UC Berkeley aijunbai@berkeley.edu Stuart Russell UC Berkeley russell@cs.berkeley.edu
Pseudocode Yes Algorithm 1 gives the pseudo-code for running a HAM, where the Execute function executes an action in the environment and returns the next environment state, and the Choose function picks the next machine state given the updated stack z, the current environment state s... and Algorithm 3 gives the pseudo-code of the HAMQ-INT algorithm.
Open Source Code No The paper does not provide any explicit statements about the release of source code or links to a code repository for the described methodology.
Open Datasets Yes We conduct experiments on the benchmark Taxi domain [Dietterich, 1999] and a much more complex Robo Cup Keepaway domain [Stone et al., 2005].
Dataset Splits No The paper mentions general learning parameters like learning rate and exploration policy, but does not specify dataset splits (e.g., train/validation/test percentages or sample counts) or cross-validation details.
Hardware Specification No The paper does not provide any specific hardware details such as exact GPU/CPU models, memory amounts, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions using "SARSA learning rule with a linear function approximator" and refers to "ALisp" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For all learning algorithms, the learning rate is set to be 0.125; an ϵ-Greedy policy which selects a random action with probability 0.01 is used to balance between exploration and exploitation.