reproducibilityindex.ai

Neural Stochastic Dual Dynamic Programming

Authors: Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An empirical investigation demonstrates that -SDDP can signi ﬁcantly reduce problem solving cost without sacri ﬁcing solution quality over competitors such as SDDP and reinforcement learning algorithms, across a range of synthetic and real-world process optimization problems.
Researcher Affiliation	Industry	Google Research, Brain Team Google Cloud AI {hadai, yuanxue, zsyed, schuurmans, bodai}@google.com
Pseudocode	Yes	Algorithm 1 SDDP( V 0 t T t=1 , ξ1, n), Algorithm 2 Fast-Inference({ut}T t=1 , f, ψ, ξ1), Algorithm 3 ν-SDDP
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described.
Open Datasets	No	The paper states 'demand forecasts are synthetically generated from a normal distribution' and 'We use an autoregressive process of order 2 to learn the price forecast model based on the real daily stock prices in the past 5 years.' It does not provide concrete access information (link, DOI, formal citation) to the processed or generated datasets used in the experiments.
Dataset Splits	No	The paper states 'We split the task instances into train, validation and test splits' and 'We report the selected hyperparameters for each algorithm in Table. 5. We use MLP with 3 layers as the Q-network for DQN, as the actor network and the critic/value network for SAC, PPO and DDPG. All networks have the same learning rate and with a dropout parameter as 0.001. The model used in the evaluation is selected based on the best mean return over the 50 trajectories from the validation environment, based on which the hyperparameters are also tuned.' However, it does not provide specific details on the dataset split percentages or sample counts for training, validation, and testing.
Hardware Specification	Yes	For RL based approaches and our -SDDP, we train using a single V100 GPU for each hparameter con ﬁguration for at most 1 day or till convergence.
Software Dependencies	No	The paper mentions using 'the Tensorﬂow TF-Agents library' and implementations of 'DQN', 'DDPG', 'PPO', and 'SAC' from TF-Agents, but does not provide specific version numbers for TensorFlow, TF-Agents, or any other software dependencies.
Experiment Setup	Yes	We report the selected hyperparameters for each algorithm in Table. 5. We use MLP with 3 layers as the Q-network for DQN, as the actor network and the critic/value network for SAC, PPO and DDPG. All networks have the same learning rate and with a dropout parameter as 0.001.