Identifying Latent State-Transition Processes for Individualized Reinforcement Learning
Authors: Yuewen Sun, Biwei Huang, Yu Yao, Donghuo Zeng, Xinshuai Dong, Songyao Jin, Boyang Sun, Roberto Legaspi, Kazushi Ikeda, Peter Spirtes, Kun Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various datasets show that the proposed method can effectively identify latent state-transition processes and facilitate the learning of individualized RL policies. |
| Researcher Affiliation | Collaboration | 1Mohamed bin Zayed University of Artificial Intelligence, 2Carnegie Mellon University, 3University of California San Diego, 4The University of Sydney, 5KDDI Research |
| Pseudocode | Yes | The pseudocode for the proposed algorithm is presented in Algorithm 1 and Algorithm 2. |
| Open Source Code | No | The paper does not provide an explicit statement or link within its main text or appendices for open-source code specific to the methodology described. |
| Open Datasets | Yes | We further evaluate our framework on the real-world dataset, Persuasion For Good corpus [77], which is widely used for analyzing persuasion strategies [64, 7, 87]. |
| Dataset Splits | Yes | The estimation framework is trained using Adam W optimizer for a maximum of 200 epochs and early stops if the validation ELBO loss does not decrease for ten epochs. |
| Hardware Specification | Yes | We used a machine with the following CPU specifications: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz with 16 logical processors. The machine has one Ge Force RTX 3080 GPU with 32GB GPU memory. |
| Software Dependencies | No | The paper mentions using 'Adam W optimizer' but does not specify version numbers for this optimizer or any other software dependencies such as programming languages or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | A learning rate of 0.001 and a mini-batch size of 32 are used. We used three random seeds in each experiment and reported the mean performance with standard deviation averaged across random seeds. |