reproducibilityindex.ai

Online Symbolic Gradient-Based Optimization for Factored Action MDPs

Authors: Hao Cui, Roni Khardon

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An experimental evaluation on benchmark problems shows that the algorithm is competitive with state of the art across problem sizes and that it provides signiﬁcant improvements for large factored action spaces. The top two rows in Figure 2 show a comparison of SOGBOFA to the baselines in all 6 domains showing a signiﬁcant advantage on the problems with large action spaces.
Researcher Affiliation	Academia	Hao Cui and Roni Khardon Department of Computer Science, Tufts University, Medford, Massachusetts, USA Hao.Cui@tufts.edu, roni@cs.tufts.edu
Pseudocode	Yes	Section 3, 'Gradient Based Optimization', presents a numbered list of steps for the SOGBOFA algorithm, formatted like pseudocode: '1 Qf Build Qf(S, time Allowed) 2 As = { } 3 while time remaining 4 do A Random Restart() 5 while time remaining and not converged 6 do D Calculate Gradient(Qf) 7 A Make Updates(D) 8 A Projection(A) 9 As.add(Sample Concrete Act(A)) 10 action Best(As)'
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	The evaluation is done over 6 domains, elevators from the RDDL distribution, two from the International Probabilistic Planning Competition (IPPC) 2011, and three from IPPC 2014 where we chose domains amenable to expansion of the action space.
Dataset Splits	No	The paper describes using problem instances from RDDL and IPPC competitions and running experiments on them (e.g., '20 runs in each problem instance where each instance is run for 40 time steps'). However, it does not specify explicit training/validation/test dataset splits in terms of percentages or sample counts for these problem instances, nor does it refer to standard splits with citations.
Hardware Specification	Yes	The reported results represent averages from 20 runs in each problem instance where each instance is run for 40 time steps. In these runs each algorithm is given 60 seconds per step (on a cluster node having Intel Xeon X5675@ 3GHz CPU, and 24GB memory).
Software Dependencies	No	The paper mentions using the RDDL language ('we use the RDDL language [Sanner, 2010] to specify the domain') for domain specification, but does not provide specific version numbers for any software libraries, frameworks, or tools used in the implementation or experimentation.
Experiment Setup	Yes	The algorithm limits depth when the expected number of gradient updates is below a pre-ﬁxed threshold. ... (where k = 200 in our experiments). ... Our implementation stops the gradient search if the max change in probability is less than S = 0.1, that is, k Anew Aoldk1 0.1. We then perform a linear search using 10 evenly spaced values a0, a1...a9 in that region to ﬁnd the best where quality is measured by the value of the Q function DAG.