reproducibilityindex.ai

Semi-Markov Reinforcement Learning for Stochastic Resource Collection

Authors: Sebastian Schmoll, Matthias Schubert

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on a environment based on the real-world parking data of the city of Melbourne. In small, hence simple, settings with short distances between resources and few simultaneous violations, our approach is comparable to previous work. When the size of the network grows (and hence the amount of resources) our solution signiﬁcantly outperforms preceding methods. Moreover, applying a trained agent to a non-overlapping new area outperforms existing approaches.
Researcher Affiliation	Academia	Sebastian Schmoll and Matthias Schubert LMU Munich {schmoll, schubert}@dbs.iﬁ.lmu.de
Pseudocode	No	The paper provides algorithmic descriptions and equations (e.g., update rules) but does not include a formal pseudocode block or algorithm listing.
Open Source Code	No	The paper does not provide any explicit statement about releasing its source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We use the real-world and freely available dataset of on-street parking spots1, containing the arrival and departure times as well as the respective restrictions of parking events of the Melbourne city in the year of 2017. 1https://data.melbourne.vic.gov.au/browse?tags=parking
Dataset Splits	Yes	To prevent overﬁtting, we split the parking event dataset into three sets, training, validation, and test. The segmentation is based on the following principle. If the remainder of the day s number of the year divided by 13, is zero, the day is added to the test set (28 days). If the rest is one, the day is added to the validation set (27 days). All other days are in the training set.
Hardware Specification	Yes	We trained our approach on various GTX/RTX GPU computing machines. The presented results are the best with respect to the validation results after tuning the hyper-parameter (e.g. learning/exploration rate, batch size, hidden neurons). As the ACO algorithm plans on execution time, we assigned a maximum computation time available (1 and 0.1 seconds) for each decision (single core Intel i7-3770 3.40GHz)
Software Dependencies	No	The paper mentions software components like 'Deep-Q-Network (DQN)', 'Double DQN', and 'prioritized experience buffers' but does not specify their version numbers or other ancillary software details.
Experiment Setup	Yes	The presented results are the best with respect to the validation results after tuning the hyper-parameter (e.g. learning/exploration rate, batch size, hidden neurons).