Large-State Reinforcement Learning for Hyper-Heuristics
Authors: Lucas Kletzander, Nysret Musliu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The benefit of the collaboration of our novel components is shown on the academic benchmark of the Cross Domain Heuristic Challenge 2011 consisting of six different problem domains. Our approach can provide stateof-the-art results on this benchmark where it outperforms recent hyper-heuristics based on reinforcement learning, and also demonstrates high performance on a benchmark of complex real-life personnel scheduling domains. |
| Researcher Affiliation | Academia | Christian Doppler Laboratory for Artificial Intelligence and Optimization for Planning and Scheduling DBAI, TU Wien, Karlsplatz 13, 1040 Vienna, Austria {lucas.kletzander,nysret.musliu}@tuwien.ac.at |
| Pseudocode | Yes | This paper introduces the novel hyper-heuristic LAST-RL (Large State Reinforcement Learning), with the core learning loop as shown in Algorithm 1: ... Algorithm 1: LAST-RL learning loop |
| Open Source Code | Yes | The hyper-heuristic1 was implemented in the Java Hy Flex framework and run using Open JDK 14.0.1. 1Available at: https://gitlab.tuwien.ac.at/lucas.kletzander/last-rl |
| Open Datasets | Yes | The first evaluation was performed on the six domains from CHe SC 2011, the Cross Domain Heuristic Search Challenge 2011 (Burke et al. 2011) implemented in the Hy Flex framework (Ochoa et al. 2012a), which are Max SAT (SAT), Bin Packing (BP), Personnel Scheduling (PS), Flowshop (FS), Travelling Salesperson Problem (TSP), and Vehicle Routing Problem (VRP). ... Bus Driver Scheduling according to laws of an Austrian collective agreement (Kletzander and Musliu 2020). |
| Dataset Splits | Yes | Next we also used parameter tuning with SMAC 3 version 1.1.1 (Lindauer et al. 2021) allowing 1 million seconds using the additional CHe SC instances not used for the competition for training. |
| Hardware Specification | Yes | The experiments were run on a Windows system with an Intel 12th Gen i912900K with 3.19 GHz and 32 GB of RAM, each individual run was performed single-threaded. ... All evaluations were performed with one hour of runtime (same as the previous results) on a computing cluster running Ubuntu 16.04.1 LTS with Intel Xeon CPUs E5-2650 v4 (max. 2.90GHz, 12 physical cores, no hyperthreading), but each individual run was performed single-threaded. |
| Software Dependencies | Yes | The implementation of reinforcement learning is done using the BURLAP library version 3.0.1 (Mac Glashan 2016). The hyper-heuristic1 was implemented in the Java Hy Flex framework and run using Open JDK 14.0.1. ... Next we also used parameter tuning with SMAC 3 version 1.1.1 (Lindauer et al. 2021)... The problem domains are implemented in Python and run using Py Py 7.3.5, the hyper-heuristics are run using Open JDK 8u292. |
| Experiment Setup | Yes | Based on thorough initial experimentation, we validated our modifications for the solution chains and came up with a manually tuned set of parameters. Next we also used parameter tuning with SMAC 3 version 1.1.1 (Lindauer et al. 2021)... Both sets of parameters are shown in Table 1. |