reproducibilityindex.ai

Offline Transition Modeling via Contrastive Energy Learning

Authors: Ruifeng Chen, Chengxing Jia, Zefang Huang, Tian-Shuo Liu, Xu-Hui Liu, Yang Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct a series of experiments to answer the following questions: (1). Does ETM better recover the discontinuous transition behaviors than standard FTMs? (2). Does ETM have a smaller transition error on out-of-distribution transitions? (3). Can ETM facilitate sequential decision-making tasks like off-policy evaluation and offline RL?
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China 2Polixir Technologies.
Pseudocode	Yes	Algorithm 1 Energy-based Transition Model Learning
Open Source Code	Yes	1code: https://github.com/Ruifeng-Chen/Energy-Transition-Models.git
Open Datasets	Yes	We also conduct experiments on D4RL benchmarks (Fu et al., 2020) , where the improvement of model accuracy boosts the performance of policy optimization.
Dataset Splits	No	No explicit statement providing specific percentages, sample counts, or clear predefined split references for training, validation, and test datasets was found for their experiments.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions software like 'Offline RLKit' and 'Soft Actor Critic (SAC)' and that the implementation is based on 'Pytorch', but no specific version numbers for any software dependencies are provided.
Experiment Setup	Yes	The detailed hyperparameter setting is listed in Appendix C. The hyperparameters are listed in Table 2. The base hyperparameter settings in Table 3 for all the Gym-Mujoco tasks. Two hyperparameters, penalty coefficient β and rollout length h, are tuned for each task and we list them in Table 4.