Warm-Starting Nested Rollout Policy Adaptation with Optimal Stopping

Authors: Chen Dang, Cristina Bazgan, Tristan Cazenave, Morgan Chopin, Pierre-Henri Wuillemin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The fourth section gives experimental results for the Minimum Congestion Shortest Path Routing problem, the Traveling Salesman Problem with Time Windows and the Snake-in-the-Box problem.
Researcher Affiliation Collaboration 1 Orange Labs, Chˆatillon, France 2 Universit e Paris-Dauphine, PSL Research University, CNRS, UMR 7243, LAMSADE, F-75016 Paris, France 3 Sorbonne Universit e, CNRS, UMR 7606, LIP6, F-75005 Paris, France
Pseudocode Yes Algorithm 1: The playout algorithm, Algorithm 2: The adapt algorithm, Algorithm 3: The NRPA algorithm, Algorithm 4: Meta-NRPA with one item, Algorithm 5: Meta-NRPA with α% items
Open Source Code No The paper does not provide any link or explicit statement about releasing the source code for the described methodology.
Open Datasets Yes We test our algorithms on rc204.1, which is the most difficult instance in the Solomon-Potwin-Bengio TSPTW benchmark.
Dataset Splits No The paper does not provide specific train/validation/test dataset splits. The problems addressed are combinatorial optimization problems on specific instances, not dataset splits for machine learning tasks.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes We use NRPA with a level of 2 and 50 iterations. Each method executes 20 independent runs on each graph, the results are normalized according to the lower bound calculated by Fleischer s approximation scheme with ϵ = 0.1 (Fleischer 2000). Graphs having more than 400 nodes are executed for 2 hours, others for 30 minutes. ... We use NRPA of level 4 and 100 iterations. ... The learning rate of NRPA α is set to 0.01... We use NRPA with level 4, 100 iterations, Meta-NRPA with 10% items, and 5% for ϵ-greedy, 0.01 for learning rate α.