Hierarchical Subtask Discovery with Non-Negative Matrix Factorization
Authors: Adam C. Earle, Andrew M. Saxe, Benjamin Rosman
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate that the proposed scheme recovers an intuitive decomposition, we consider the resulting low-rank approximation to the desirability basis in two domains, for a few hand-picked decomposition factors. All results presented in this section correspond to solutions to Eqn.(2) for β = 1 so that the cost function is taken to be the KL-divergence (although the method does not appear to be overly sensitive to β [1, 2]). Note that in the same way that the columns of Z represent the exponentiated cost-to-go for the single-states tasks in the basis, so the columns in D represent the exponentiated cost-to-go for the discovered subtasks. In Fig. 2, we compute the data matrix D Rm k for k = {4, 9, 16} for both the nested rooms domain, and the hairpin domain. The desirability functions for each of the subtasks is then plotted over the base domain. |
| Researcher Affiliation | Academia | Adam C. Earle Department of Computer Science and Applied Mathematics University of the Witwatersrand Johannesburg, South Africa adam.earle@students.wits.ac.za Andrew M. Saxe Center for Brain Science Harvard University MA, USA asaxe@fas.harvard.edu Benjamin Rosman Council for Scientific and Industrial Research Pretoria, South Africa, and Department of Computer Science and Applied Mathematics University of the Witwatersrand Johannesburg, South Africa brosman@csir.co.za |
| Pseudocode | No | The paper does not contain any formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is available, nor does it provide a link. |
| Open Datasets | No | The paper refers to domains such as "nested rooms domain", "hairpin domain", and "standard TAXI domain", which are common in reinforcement learning literature. However, it does not provide specific access information (e.g., URLs, DOIs, formal citations for specific dataset versions, or repository names) to any publicly available or open dataset used for training. It describes the characteristics of these domains, but not how to access the specific data or environment implementations. |
| Dataset Splits | No | The paper describes experiments in reinforcement learning environments but does not mention standard training, validation, or test dataset splits (e.g., 70/15/15 splits or sample counts for each split) as typically found in supervised learning contexts. |
| Hardware Specification | No | The paper does not describe any specific hardware (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers). |
| Experiment Setup | No | The paper mentions parameters like `β` and decomposition factor `k` and a scaling parameter `αl`. It states that for the demonstrations, `β = 1` and `k = {4, 9, 16}`. For hierarchical decomposition, it specifies `kl = {60, 15, 3}` for different layers. However, it does not provide full experimental setup details such as optimizer types, learning rates, number of training iterations/epochs, or other hyperparameters required to fully reproduce the experimental runs beyond these specific values related to the NMF process itself. |