Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions
Authors: Aijun Bai, Stuart Russell
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on the benchmark Taxi domain [Dietterich, 1999] and a much more complex Robo Cup Keepaway domain [Stone et al., 2005]. |
| Researcher Affiliation | Academia | Aijun Bai UC Berkeley EMAIL Stuart Russell UC Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 gives the pseudo-code for running a HAM, where the Execute function executes an action in the environment and returns the next environment state, and the Choose function picks the next machine state given the updated stack z, the current environment state s... and Algorithm 3 gives the pseudo-code of the HAMQ-INT algorithm. |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code or links to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct experiments on the benchmark Taxi domain [Dietterich, 1999] and a much more complex Robo Cup Keepaway domain [Stone et al., 2005]. |
| Dataset Splits | No | The paper mentions general learning parameters like learning rate and exploration policy, but does not specify dataset splits (e.g., train/validation/test percentages or sample counts) or cross-validation details. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as exact GPU/CPU models, memory amounts, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions using "SARSA learning rule with a linear function approximator" and refers to "ALisp" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For all learning algorithms, the learning rate is set to be 0.125; an ϵ-Greedy policy which selects a random action with probability 0.01 is used to balance between exploration and exploitation. |