Tree-Based On-Line Reinforcement Learning

Authors: Andre Barreto

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A series of computational experiments is presented to illustrate the properties of TBSF and to show its usefulness in practice, including a medical problem involving the treatment of patients infected with HIV.
Researcher Affiliation Academia Andr e da Motta Salles Barreto Laborat orio Nacional de Computac ao Cient ıfica Petr opolis, RJ, Brazil
Pseudocode Yes Algorithm 1 TBSF 1: Input: fit Algorithm to construct the trees µ Distribution over S A 2: Output: Approximate value function Q 3: for it 1, 2, ..., itmax do 4: Collect n transitions based on µ and store in Sa 5: T a fit(Sa) for all a A Construct or grow 6: Update M using (13) and (14) 7: Apply t iterations of value iteration to Q 8: Modify µ based on Q(s, a) Optional 9: Sa for all a A Discard transitions
Open Source Code No The paper does not provide a direct link to open-source code for the described methodology or explicitly state its release.
Open Datasets Yes We revisit one of the most successful applications of FQIT: an important medical problem which we will refer to as the HIV domain (Adams et al. 2004; Ernst et al. 2006).
Dataset Splits No The paper describes data collection processes and evaluation sets but does not specify explicit train/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not specify any particular hardware details such as CPU/GPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes When combined with the extra-trees algorithm FQIT has four parameters; in our experiments we used 200 iterations, dim(S) candidate cut points to build the trees, and varied |T| and ηmin (during the construction of the trees, a node is split only if it contains at least ηmin points hence, this parameter can be seen as an indirect way of setting the number of partitions).