reproducibilityindex.ai

PoRank: A Practical Framework for Learning to Rank Policies

Authors: Pengjie Gu, Mengchen Zhao, Xu He, Yi Cai, Bo An

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Po Rank not only outperforms baselines when the ground-truth labels are provided, but also achieves competitive performance when the ground-truth labels are unavailable.
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2School of Software Engineering, South China University of Technology, China 3Huawei Noah Ark Lab 4Skywork AI, Singapore
Pseudocode	No	The paper describes procedures using text and mathematical equations, but does not contain explicitly labeled pseudocode or algorithm blocks in the main text.
Open Source Code	No	The paper does not provide an explicit statement or link to the source code for the methodology described in this paper. It only links to the implementation of baseline OPE algorithms.
Open Datasets	Yes	We evaluate Po Rank and all baseline OPE methods on D4RL dataset consisting of various trajectory sets [Fu et al., 2020].
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions algorithms and general software implementations for baselines (e.g., 'Soft Actor-Critic (SAC) algorithm', 'policy_eval implementation'), but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Selection of the Batch Size of State-Action Pairs In the training phase, the batch size of state-action pairs feeded into the Transformer is an important hyper-parameter in our model. It play the role in balancing the computational cost and the performance. We chose the number 256 as the batch size. This choice is supported by the experimental results reported in Table 3, which show the averaged rank correlations of our model with the batch size growing.