Reinforcement Learning for Cross-Domain Hyper-Heuristics

Authors: Florian Mischek, Nysret Musliu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide a detailed analysis and evaluation of the algorithm components, including different ways to represent the hyper-heuristic state space and reset strategies to avoid unpromising areas of the solution space. Our methods have been evaluated using Hy Flex, a well-known benchmarking framework for cross-domain hyper-heuristics, and compared with state-of-the-art approaches. The experimental evaluation shows that our reinforcement-learning based approach produces results that are competitive with the state-of-the-art, including the top participants of the Cross Domain Hyper-heuristic Search Competition 2011.
Researcher Affiliation Academia Florian Mischek , Nysret Musliu Christian Doppler Laboratory for Artificial Intelligence and Optimization for Planning and Scheduling, TU Wien {florian.mischek, nysret.musliu}@tuwien.ac.at
Pseudocode No The paper provides mathematical formulas and descriptions of algorithms, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code Yes The source code for our implementation is available for download at https://gitlab.tuwien.ac.at/florian.mischek/ hyper-heuristics-public.
Open Datasets Yes We implemented our approaches as hyper-heuristics in the Hy Flex framework and evaluated them on the benchmark instances provided for the six original Hy Flex problem domains. The problem domains used for the CHe SC are the maximum satisfiability problem (Max SAT), bin packing (BP), flow shop (FS), personnel scheduling (PS), the traveling salesman problem (TSP), and the vehicle routing problem (VRP).
Dataset Splits No The paper mentions evaluating on 'benchmark instances' and performing '10 runs per instance', but it does not specify explicit train/validation/test dataset splits in the conventional sense for model training. The learning process is online, happening during interaction with the problem instances.
Hardware Specification Yes The experiments were performed each using a single thread of a computing cluster with 10 nodes of 24 cores, an Intel(R) Xeon(R) CPU E5 2650 v4 @ 2.20GHz and 252 GB RAM.
Software Dependencies No The paper mentions using the 'Hy Flex framework' but does not provide specific version numbers for it or any other software libraries or dependencies used in the implementation.
Experiment Setup Yes This baseline configuration uses a state representation consisting of the last heuristic and the tail length up to a maximum length of 5 (LH+TL5), a 0.2-greedy action selection policy, constant rewards for improved solutions, a learning rate of 0.1 where required, and Luby s sequence to determine chain lengths. For this paper, we use γ = 0.8, based on the results of preliminary experiments.