PoRank: A Practical Framework for Learning to Rank Policies

Authors: Pengjie Gu, Mengchen Zhao, Xu He, Yi Cai, Bo An

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that Po Rank not only outperforms baselines when the ground-truth labels are provided, but also achieves competitive performance when the ground-truth labels are unavailable.
Researcher Affiliation Collaboration 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2School of Software Engineering, South China University of Technology, China 3Huawei Noah Ark Lab 4Skywork AI, Singapore
Pseudocode No The paper describes procedures using text and mathematical equations, but does not contain explicitly labeled pseudocode or algorithm blocks in the main text.
Open Source Code No The paper does not provide an explicit statement or link to the source code for the methodology described in this paper. It only links to the implementation of baseline OPE algorithms.
Open Datasets Yes We evaluate Po Rank and all baseline OPE methods on D4RL dataset consisting of various trajectory sets [Fu et al., 2020].
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions algorithms and general software implementations for baselines (e.g., 'Soft Actor-Critic (SAC) algorithm', 'policy_eval implementation'), but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Selection of the Batch Size of State-Action Pairs In the training phase, the batch size of state-action pairs feeded into the Transformer is an important hyper-parameter in our model. It play the role in balancing the computational cost and the performance. We chose the number 256 as the batch size. This choice is supported by the experimental results reported in Table 3, which show the averaged rank correlations of our model with the batch size growing.