Identifying Latent State-Transition Processes for Individualized Reinforcement Learning

Authors: Yuewen Sun, Biwei Huang, Yu Yao, Donghuo Zeng, Xinshuai Dong, Songyao Jin, Boyang Sun, Roberto Legaspi, Kazushi Ikeda, Peter Spirtes, Kun Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various datasets show that the proposed method can effectively identify latent state-transition processes and facilitate the learning of individualized RL policies.
Researcher Affiliation Collaboration 1Mohamed bin Zayed University of Artificial Intelligence, 2Carnegie Mellon University, 3University of California San Diego, 4The University of Sydney, 5KDDI Research
Pseudocode Yes The pseudocode for the proposed algorithm is presented in Algorithm 1 and Algorithm 2.
Open Source Code No The paper does not provide an explicit statement or link within its main text or appendices for open-source code specific to the methodology described.
Open Datasets Yes We further evaluate our framework on the real-world dataset, Persuasion For Good corpus [77], which is widely used for analyzing persuasion strategies [64, 7, 87].
Dataset Splits Yes The estimation framework is trained using Adam W optimizer for a maximum of 200 epochs and early stops if the validation ELBO loss does not decrease for ten epochs.
Hardware Specification Yes We used a machine with the following CPU specifications: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz with 16 logical processors. The machine has one Ge Force RTX 3080 GPU with 32GB GPU memory.
Software Dependencies No The paper mentions using 'Adam W optimizer' but does not specify version numbers for this optimizer or any other software dependencies such as programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes A learning rate of 0.001 and a mini-batch size of 32 are used. We used three random seeds in each experiment and reported the mean performance with standard deviation averaged across random seeds.