Offline Transition Modeling via Contrastive Energy Learning

Authors: Ruifeng Chen, Chengxing Jia, Zefang Huang, Tian-Shuo Liu, Xu-Hui Liu, Yang Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct a series of experiments to answer the following questions: (1). Does ETM better recover the discontinuous transition behaviors than standard FTMs? (2). Does ETM have a smaller transition error on out-of-distribution transitions? (3). Can ETM facilitate sequential decision-making tasks like off-policy evaluation and offline RL?
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, China & School of Artificial Intelligence, Nanjing University, China 2Polixir Technologies.
Pseudocode Yes Algorithm 1 Energy-based Transition Model Learning
Open Source Code Yes 1code: https://github.com/Ruifeng-Chen/Energy-Transition-Models.git
Open Datasets Yes We also conduct experiments on D4RL benchmarks (Fu et al., 2020) , where the improvement of model accuracy boosts the performance of policy optimization.
Dataset Splits No No explicit statement providing specific percentages, sample counts, or clear predefined split references for training, validation, and test datasets was found for their experiments.
Hardware Specification No No specific hardware details (GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided in the paper.
Software Dependencies No The paper mentions software like 'Offline RLKit' and 'Soft Actor Critic (SAC)' and that the implementation is based on 'Pytorch', but no specific version numbers for any software dependencies are provided.
Experiment Setup Yes The detailed hyperparameter setting is listed in Appendix C. The hyperparameters are listed in Table 2. The base hyperparameter settings in Table 3 for all the Gym-Mujoco tasks. Two hyperparameters, penalty coefficient β and rollout length h, are tuned for each task and we list them in Table 4.