reproducibilityindex.ai

Reinforcement Learning for Cross-Domain Hyper-Heuristics

Authors: Florian Mischek, Nysret Musliu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a detailed analysis and evaluation of the algorithm components, including different ways to represent the hyper-heuristic state space and reset strategies to avoid unpromising areas of the solution space. Our methods have been evaluated using Hy Flex, a well-known benchmarking framework for cross-domain hyper-heuristics, and compared with state-of-the-art approaches. The experimental evaluation shows that our reinforcement-learning based approach produces results that are competitive with the state-of-the-art, including the top participants of the Cross Domain Hyper-heuristic Search Competition 2011.
Researcher Affiliation	Academia	Florian Mischek , Nysret Musliu Christian Doppler Laboratory for Artiﬁcial Intelligence and Optimization for Planning and Scheduling, TU Wien {ﬂorian.mischek, nysret.musliu}@tuwien.ac.at
Pseudocode	No	The paper provides mathematical formulas and descriptions of algorithms, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	The source code for our implementation is available for download at https://gitlab.tuwien.ac.at/ﬂorian.mischek/ hyper-heuristics-public.
Open Datasets	Yes	We implemented our approaches as hyper-heuristics in the Hy Flex framework and evaluated them on the benchmark instances provided for the six original Hy Flex problem domains. The problem domains used for the CHe SC are the maximum satisﬁability problem (Max SAT), bin packing (BP), ﬂow shop (FS), personnel scheduling (PS), the traveling salesman problem (TSP), and the vehicle routing problem (VRP).
Dataset Splits	No	The paper mentions evaluating on 'benchmark instances' and performing '10 runs per instance', but it does not specify explicit train/validation/test dataset splits in the conventional sense for model training. The learning process is online, happening during interaction with the problem instances.
Hardware Specification	Yes	The experiments were performed each using a single thread of a computing cluster with 10 nodes of 24 cores, an Intel(R) Xeon(R) CPU E5 2650 v4 @ 2.20GHz and 252 GB RAM.
Software Dependencies	No	The paper mentions using the 'Hy Flex framework' but does not provide specific version numbers for it or any other software libraries or dependencies used in the implementation.
Experiment Setup	Yes	This baseline conﬁguration uses a state representation consisting of the last heuristic and the tail length up to a maximum length of 5 (LH+TL5), a 0.2-greedy action selection policy, constant rewards for improved solutions, a learning rate of 0.1 where required, and Luby s sequence to determine chain lengths. For this paper, we use γ = 0.8, based on the results of preliminary experiments.