Semi-Markov Reinforcement Learning for Stochastic Resource Collection
Authors: Sebastian Schmoll, Matthias Schubert
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on a environment based on the real-world parking data of the city of Melbourne. In small, hence simple, settings with short distances between resources and few simultaneous violations, our approach is comparable to previous work. When the size of the network grows (and hence the amount of resources) our solution significantly outperforms preceding methods. Moreover, applying a trained agent to a non-overlapping new area outperforms existing approaches. |
| Researcher Affiliation | Academia | Sebastian Schmoll and Matthias Schubert LMU Munich {schmoll, schubert}@dbs.ifi.lmu.de |
| Pseudocode | No | The paper provides algorithmic descriptions and equations (e.g., update rules) but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use the real-world and freely available dataset of on-street parking spots1, containing the arrival and departure times as well as the respective restrictions of parking events of the Melbourne city in the year of 2017. 1https://data.melbourne.vic.gov.au/browse?tags=parking |
| Dataset Splits | Yes | To prevent overfitting, we split the parking event dataset into three sets, training, validation, and test. The segmentation is based on the following principle. If the remainder of the day s number of the year divided by 13, is zero, the day is added to the test set (28 days). If the rest is one, the day is added to the validation set (27 days). All other days are in the training set. |
| Hardware Specification | Yes | We trained our approach on various GTX/RTX GPU computing machines. The presented results are the best with respect to the validation results after tuning the hyper-parameter (e.g. learning/exploration rate, batch size, hidden neurons). As the ACO algorithm plans on execution time, we assigned a maximum computation time available (1 and 0.1 seconds) for each decision (single core Intel i7-3770 3.40GHz) |
| Software Dependencies | No | The paper mentions software components like 'Deep-Q-Network (DQN)', 'Double DQN', and 'prioritized experience buffers' but does not specify their version numbers or other ancillary software details. |
| Experiment Setup | Yes | The presented results are the best with respect to the validation results after tuning the hyper-parameter (e.g. learning/exploration rate, batch size, hidden neurons). |