reproducibilityindex.ai

Structure Learning for Safe Policy Improvement

Authors: Thiago D. Simão, Matthijs T. J. Spaan

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Empirical Analysis We evaluate the Structure Learning Πb-SPIBB framework combined with the two structure learning algorithms presented before (SL and k-meteorologists) in three domains. Figures 1 and 2 present the results. In every plot the x-axis shows the number of trials in the batch collected with the behavior policy.
Researcher Affiliation	Academia	Thiago D. Sim ao and Matthijs T. J. Spaan Delft University of Technology, The Netherlands {t.diassimao, m.t.j.spaan}@tudelft.nl
Pseudocode	Yes	Algorithm 1 Policy-based SPIBB (Πb-SPIBB), Algorithm 2 Factored Πb-SPIBB, Algorithm 3 Structure Learning Πb-SPIBB
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository or mention code in supplementary materials.
Open Datasets	Yes	The problems used are: (i) the Taxi domain with a horizon of 200 steps [Dietterich, 1998], (ii) the Sys Admin domain with 9 machines in a bidirectional ring topology and a horizon of 40 steps [Guestrin et al., 2003], and (iii) the Stock-Trading domain with 3 sectors and 2 stocks per sector with a horizon of 40 steps [Strehl et al., 2007].
Dataset Splits	No	The paper refers to a 'batch D of previous experiences' and 'estimating the performance' but does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or defined methodologies for partitioning the data.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to conduct the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with their version numbers that would be required to replicate the experiments (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	Table 1 reports the parameters used by each algorithm. These values were chosen in order to reduce the number of samples required to improve the policy, while keeping a safe behavior. All algorithms use a ﬂat estimate of the transition function and a ﬂat Value Iteration algorithm with a discount factor of 0.99. The softmax temperature is set to 2 for the Taxi and Stock-Trading domains and to 3 for the Sys Admin domain.