Reward-Based Negotiating Agent Strategies
Authors: Ryota Higa, Katsuhide Fujita, Toki Takahashi, Takumu Shimizu, Shinji Nakadai
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This study proposed a novel reward-based negotiating agent strategy using an issue-based represented deep policy network. We compared the negotiation strategies with reinforcement learning (RL) by the tournaments toward heuristics-based champion agents in multi-issue negotiation. |
| Researcher Affiliation | Collaboration | 1NEC Corporation, Japan 2National Institute of Advanced Industrial Science and Technology(AIST), Japan 3Tokyo University of Agriculture and Technology, Japan |
| Pseudocode | No | The paper includes mathematical formulations and architectural diagrams but no explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, "Some packages to realize the proposed idea are available; the following evaluations were made by improving the package for Neg MAS platform(Mohammad et al. 2019) and RLbaselines(Raffin et al. 2021)." This indicates use and improvement of existing open-source packages but does not provide a direct link or explicit statement about releasing the authors' specific implementation code for this paper's methodology. |
| Open Datasets | Yes | The sizes and oppositions of all domains are in Table 22. These negotiation domains are included in the negotiation platform Genius (Lin et al. 2014). |
| Dataset Splits | No | The paper mentions "training and test phases" and that agents were "trained with 10 initial values," but it does not specify any dataset splits for validation (e.g., percentages, counts, or explicit validation sets). |
| Hardware Specification | Yes | The code was implemented in Python 3.8 and run on 28 core CPUs with 128 GB of memory with Ubuntu Desktop 22.04 as the operating system. |
| Software Dependencies | Yes | The code was implemented in Python 3.8 and run on 28 core CPUs with 128 GB of memory with Ubuntu Desktop 22.04 as the operating system. |
| Experiment Setup | Yes | The deadline (T) is set to 40 rounds... The training period was 500,000 steps. As a policy network, a NN with two hidden layers of 64 units, and a tanh function was used as the activation function... The detailed hyperparameters are provided in Table 1. |