reproducibilityindex.ai

Reward-Based Negotiating Agent Strategies

Authors: Ryota Higa, Katsuhide Fujita, Toki Takahashi, Takumu Shimizu, Shinji Nakadai

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This study proposed a novel reward-based negotiating agent strategy using an issue-based represented deep policy network. We compared the negotiation strategies with reinforcement learning (RL) by the tournaments toward heuristics-based champion agents in multi-issue negotiation.
Researcher Affiliation	Collaboration	1NEC Corporation, Japan 2National Institute of Advanced Industrial Science and Technology(AIST), Japan 3Tokyo University of Agriculture and Technology, Japan
Pseudocode	No	The paper includes mathematical formulations and architectural diagrams but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states, "Some packages to realize the proposed idea are available; the following evaluations were made by improving the package for Neg MAS platform(Mohammad et al. 2019) and RLbaselines(Raffin et al. 2021)." This indicates use and improvement of existing open-source packages but does not provide a direct link or explicit statement about releasing the authors' specific implementation code for this paper's methodology.
Open Datasets	Yes	The sizes and oppositions of all domains are in Table 22. These negotiation domains are included in the negotiation platform Genius (Lin et al. 2014).
Dataset Splits	No	The paper mentions "training and test phases" and that agents were "trained with 10 initial values," but it does not specify any dataset splits for validation (e.g., percentages, counts, or explicit validation sets).
Hardware Specification	Yes	The code was implemented in Python 3.8 and run on 28 core CPUs with 128 GB of memory with Ubuntu Desktop 22.04 as the operating system.
Software Dependencies	Yes	The code was implemented in Python 3.8 and run on 28 core CPUs with 128 GB of memory with Ubuntu Desktop 22.04 as the operating system.
Experiment Setup	Yes	The deadline (T) is set to 40 rounds... The training period was 500,000 steps. As a policy network, a NN with two hidden layers of 64 units, and a tanh function was used as the activation function... The detailed hyperparameters are provided in Table 1.