Neural Stochastic Dual Dynamic Programming
Authors: Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An empirical investigation demonstrates that -SDDP can signi ficantly reduce problem solving cost without sacri ficing solution quality over competitors such as SDDP and reinforcement learning algorithms, across a range of synthetic and real-world process optimization problems. |
| Researcher Affiliation | Industry | Google Research, Brain Team Google Cloud AI {hadai, yuanxue, zsyed, schuurmans, bodai}@google.com |
| Pseudocode | Yes | Algorithm 1 SDDP( V 0 t T t=1 , ξ1, n), Algorithm 2 Fast-Inference({ut}T t=1 , f, ψ, ξ1), Algorithm 3 ν-SDDP |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the methodology described. |
| Open Datasets | No | The paper states 'demand forecasts are synthetically generated from a normal distribution' and 'We use an autoregressive process of order 2 to learn the price forecast model based on the real daily stock prices in the past 5 years.' It does not provide concrete access information (link, DOI, formal citation) to the processed or generated datasets used in the experiments. |
| Dataset Splits | No | The paper states 'We split the task instances into train, validation and test splits' and 'We report the selected hyperparameters for each algorithm in Table. 5. We use MLP with 3 layers as the Q-network for DQN, as the actor network and the critic/value network for SAC, PPO and DDPG. All networks have the same learning rate and with a dropout parameter as 0.001. The model used in the evaluation is selected based on the best mean return over the 50 trajectories from the validation environment, based on which the hyperparameters are also tuned.' However, it does not provide specific details on the dataset split percentages or sample counts for training, validation, and testing. |
| Hardware Specification | Yes | For RL based approaches and our -SDDP, we train using a single V100 GPU for each hparameter con figuration for at most 1 day or till convergence. |
| Software Dependencies | No | The paper mentions using 'the Tensorflow TF-Agents library' and implementations of 'DQN', 'DDPG', 'PPO', and 'SAC' from TF-Agents, but does not provide specific version numbers for TensorFlow, TF-Agents, or any other software dependencies. |
| Experiment Setup | Yes | We report the selected hyperparameters for each algorithm in Table. 5. We use MLP with 3 layers as the Q-network for DQN, as the actor network and the critic/value network for SAC, PPO and DDPG. All networks have the same learning rate and with a dropout parameter as 0.001. |