Neural Stochastic Dual Dynamic Programming

Authors: Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical investigation demonstrates that -SDDP can signi ficantly reduce problem solving cost without sacri ficing solution quality over competitors such as SDDP and reinforcement learning algorithms, across a range of synthetic and real-world process optimization problems.
Researcher Affiliation Industry Google Research, Brain Team Google Cloud AI {hadai, yuanxue, zsyed, schuurmans, bodai}@google.com
Pseudocode Yes Algorithm 1 SDDP( V 0 t T t=1 , ξ1, n), Algorithm 2 Fast-Inference({ut}T t=1 , f, ψ, ξ1), Algorithm 3 ν-SDDP
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described.
Open Datasets No The paper states 'demand forecasts are synthetically generated from a normal distribution' and 'We use an autoregressive process of order 2 to learn the price forecast model based on the real daily stock prices in the past 5 years.' It does not provide concrete access information (link, DOI, formal citation) to the processed or generated datasets used in the experiments.
Dataset Splits No The paper states 'We split the task instances into train, validation and test splits' and 'We report the selected hyperparameters for each algorithm in Table. 5. We use MLP with 3 layers as the Q-network for DQN, as the actor network and the critic/value network for SAC, PPO and DDPG. All networks have the same learning rate and with a dropout parameter as 0.001. The model used in the evaluation is selected based on the best mean return over the 50 trajectories from the validation environment, based on which the hyperparameters are also tuned.' However, it does not provide specific details on the dataset split percentages or sample counts for training, validation, and testing.
Hardware Specification Yes For RL based approaches and our -SDDP, we train using a single V100 GPU for each hparameter con figuration for at most 1 day or till convergence.
Software Dependencies No The paper mentions using 'the Tensorflow TF-Agents library' and implementations of 'DQN', 'DDPG', 'PPO', and 'SAC' from TF-Agents, but does not provide specific version numbers for TensorFlow, TF-Agents, or any other software dependencies.
Experiment Setup Yes We report the selected hyperparameters for each algorithm in Table. 5. We use MLP with 3 layers as the Q-network for DQN, as the actor network and the critic/value network for SAC, PPO and DDPG. All networks have the same learning rate and with a dropout parameter as 0.001.