Scalable Initial State Interdiction for Factored MDPs

Authors: Swetasudha Panda, Yevgeniy Vorobeychik

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of our approaches. and 8 Experiments We evaluate our MDP interdiction algorithms on several instances of three problem domains from the international planning competition (IPC 2014)...
Researcher Affiliation Academia Swetasudha Panda and Yevgeniy Vorobeychik Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN {swetasudha.panda,yevgeniy.vorobeychik}@vanderbilt.edu
Pseudocode Yes Algorithm 1 Interdiction using Linear Action-Value Function Learning, Algorithm 2 Non-Linear Value Function Learning and Greedy Local Search, Algorithm 3 Interdiction with Local Linear Approximation
Open Source Code No The paper does not contain an unambiguous statement where the authors release the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets Yes We evaluate our MDP interdiction algorithms on several instances of three problem domains from the international planning competition (IPC 2014): a) sysadmin b) academic advising and c) wildfire.
Dataset Splits No The paper discusses concepts like 'train', 'validation', and 'test' in the context of general machine learning (e.g., 'To incorporate generalization, linear and non-linear function approximation are commonly used'), and mentions 'batch size' for learning. However, it does not provide specific dataset split percentages, sample counts, or references to predefined splits for the experimental data used in their evaluations on the sysadmin, academic advising, and wildfire domains.
Hardware Specification Yes The experiments were run on a 2.4GHz hyperthreaded 8-core Ubuntu Linux machine with 16 GB RAM
Software Dependencies Yes CPLEX version 12.51 for MILP instances and Tensor Flow for learning algorithms [Abadi et al., 2016].
Experiment Setup Yes We train the learning algorithms with ϵ0 = 1, η = 0.01 and the RMSProp optimizer for the neural networks. The batch size | ˆD| increases from 40 to 400 with problem size.