Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Authors: Baiting Luo, Ava Pettet, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical evaluation of L-MAP consists of three sets of tasks from D4RL (Fu et al., 2020): gym locomotion control, Ant Maze, and Adroit. We compare L-MAP to a range of prior offline RL algorithms, including both model-free actor-critic methods (Kumar et al., 2020; Kostrikov et al., 2022) and model-based approaches (Rigter et al., 2023; Jiang et al., 2023; Janner et al., 2021). Our work is conceptually most related to the Trajectory Transformer (TT; Janner et al. (2021)) and the Trajectory Autoencoding Planner (TAP; Jiang et al. (2023)), which are model-based planning methods that predict and plan in continuous state and action spaces.
Researcher Affiliation Collaboration Baiting Luo1, Ava Pettet3, Aron Laszka2, Abhishek Dubey1, Ayan Mukhopadhyay1 1Vanderbilt University, 2Pennsylvania State University, 3Nissan Advanced Technology Center EMAIL EMAIL, EMAIL
Pseudocode No The paper describes the MCTS process in detail within Section 3.2 and illustrates it in Figure 4, but does not present it as a formal pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes The empirical evaluation of L-MAP consists of three sets of tasks from D4RL (Fu et al., 2020): gym locomotion control, Ant Maze, and Adroit.
Dataset Splits No The paper states, 'For each task, we conduct experiments with 3 different training seeds, and each seed is evaluated for 20 episodes.' However, it does not explicitly detail training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations).
Hardware Specification No The paper mentions in the acknowledgements that 'Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation.' However, it does not provide specific details such as GPU or CPU models, memory amounts, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries or solvers with their versions) that would be needed to replicate the experiment.
Experiment Setup Yes As for the L-MAP-specific hyperparameters, we set our macro action length to 3. The planning horizon in the raw action space is set to 9 for gym locomotion tasks and 15 for Adroit tasks. ... For all environments, we utilize the following hyperparameters for sampling during the search process: α = 0.1 and ϵ = 1, which determine the exploration rate of progressive widening; and set the number of Monte Carlo Tree Search (MCTS) iterations to 100. Detailed parameters for each environment are presented in Table 7.