reproducibilityindex.ai

Identifying Latent State-Transition Processes for Individualized Reinforcement Learning

Authors: Yuewen Sun, Biwei Huang, Yu Yao, Donghuo Zeng, Xinshuai Dong, Songyao Jin, Boyang Sun, Roberto Legaspi, Kazushi Ikeda, Peter Spirtes, Kun Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on various datasets show that the proposed method can effectively identify latent state-transition processes and facilitate the learning of individualized RL policies.
Researcher Affiliation	Collaboration	1Mohamed bin Zayed University of Artificial Intelligence, 2Carnegie Mellon University, 3University of California San Diego, 4The University of Sydney, 5KDDI Research
Pseudocode	Yes	The pseudocode for the proposed algorithm is presented in Algorithm 1 and Algorithm 2.
Open Source Code	No	The paper does not provide an explicit statement or link within its main text or appendices for open-source code specific to the methodology described.
Open Datasets	Yes	We further evaluate our framework on the real-world dataset, Persuasion For Good corpus [77], which is widely used for analyzing persuasion strategies [64, 7, 87].
Dataset Splits	Yes	The estimation framework is trained using Adam W optimizer for a maximum of 200 epochs and early stops if the validation ELBO loss does not decrease for ten epochs.
Hardware Specification	Yes	We used a machine with the following CPU specifications: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz with 16 logical processors. The machine has one Ge Force RTX 3080 GPU with 32GB GPU memory.
Software Dependencies	No	The paper mentions using 'Adam W optimizer' but does not specify version numbers for this optimizer or any other software dependencies such as programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	A learning rate of 0.001 and a mini-batch size of 32 are used. We used three random seeds in each experiment and reported the mean performance with standard deviation averaged across random seeds.