Warm-Starting Nested Rollout Policy Adaptation with Optimal Stopping
Authors: Chen Dang, Cristina Bazgan, Tristan Cazenave, Morgan Chopin, Pierre-Henri Wuillemin
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The fourth section gives experimental results for the Minimum Congestion Shortest Path Routing problem, the Traveling Salesman Problem with Time Windows and the Snake-in-the-Box problem. |
| Researcher Affiliation | Collaboration | 1 Orange Labs, Chˆatillon, France 2 Universit e Paris-Dauphine, PSL Research University, CNRS, UMR 7243, LAMSADE, F-75016 Paris, France 3 Sorbonne Universit e, CNRS, UMR 7606, LIP6, F-75005 Paris, France |
| Pseudocode | Yes | Algorithm 1: The playout algorithm, Algorithm 2: The adapt algorithm, Algorithm 3: The NRPA algorithm, Algorithm 4: Meta-NRPA with one item, Algorithm 5: Meta-NRPA with α% items |
| Open Source Code | No | The paper does not provide any link or explicit statement about releasing the source code for the described methodology. |
| Open Datasets | Yes | We test our algorithms on rc204.1, which is the most difficult instance in the Solomon-Potwin-Bengio TSPTW benchmark. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset splits. The problems addressed are combinatorial optimization problems on specific instances, not dataset splits for machine learning tasks. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'). |
| Experiment Setup | Yes | We use NRPA with a level of 2 and 50 iterations. Each method executes 20 independent runs on each graph, the results are normalized according to the lower bound calculated by Fleischer s approximation scheme with ϵ = 0.1 (Fleischer 2000). Graphs having more than 400 nodes are executed for 2 hours, others for 30 minutes. ... We use NRPA of level 4 and 100 iterations. ... The learning rate of NRPA α is set to 0.01... We use NRPA with level 4, 100 iterations, Meta-NRPA with 10% items, and 5% for ϵ-greedy, 0.01 for learning rate α. |