Tree-Based On-Line Reinforcement Learning
Authors: Andre Barreto
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A series of computational experiments is presented to illustrate the properties of TBSF and to show its usefulness in practice, including a medical problem involving the treatment of patients infected with HIV. |
| Researcher Affiliation | Academia | Andr e da Motta Salles Barreto Laborat orio Nacional de Computac ao Cient ıfica Petr opolis, RJ, Brazil |
| Pseudocode | Yes | Algorithm 1 TBSF 1: Input: fit Algorithm to construct the trees µ Distribution over S A 2: Output: Approximate value function Q 3: for it 1, 2, ..., itmax do 4: Collect n transitions based on µ and store in Sa 5: T a fit(Sa) for all a A Construct or grow 6: Update M using (13) and (14) 7: Apply t iterations of value iteration to Q 8: Modify µ based on Q(s, a) Optional 9: Sa for all a A Discard transitions |
| Open Source Code | No | The paper does not provide a direct link to open-source code for the described methodology or explicitly state its release. |
| Open Datasets | Yes | We revisit one of the most successful applications of FQIT: an important medical problem which we will refer to as the HIV domain (Adams et al. 2004; Ernst et al. 2006). |
| Dataset Splits | No | The paper describes data collection processes and evaluation sets but does not specify explicit train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not specify any particular hardware details such as CPU/GPU models, memory, or specific computing environments used for the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | When combined with the extra-trees algorithm FQIT has four parameters; in our experiments we used 200 iterations, dim(S) candidate cut points to build the trees, and varied |T| and ηmin (during the construction of the trees, a node is split only if it contains at least ηmin points hence, this parameter can be seen as an indirect way of setting the number of partitions). |