CompetEvo: Towards Morphological Evolution from Competition

Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results reveal that our method enables agents to evolve a more suitable design and strategy for fighting compared to fixed-morph agents, allowing them to obtain advantages in combat scenarios. Moreover, we demonstrate the amazing and impressive behaviors that emerge when confrontations are conducted under asymmetrical morphs. A series of cross-antagonism experiments is conducted to validate the significant role played by morphological evolution during confrontations in enhancing an agent s ability to deal with adversaries.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua University 2School of Artificial Intelligence, Beijing University of Posts and Telecommunications 3School of Vehicle and Mobility, Tsinghua University 4Department of Automation, Tsinghua University
Pseudocode Yes Algorithm 1 Confrontation algorithm for co-evolving agents.
Open Source Code Yes Additionally, our environment files and related codes can be accessed from https://github.com/KJaebye/competevo.
Open Datasets No The paper describes training agents in simulated environments ('run-to-goal' and 'sumo') but does not mention the use or public availability of a pre-collected dataset for training.
Dataset Splits No The paper describes training in simulation environments and collecting samples during training, but it does not specify any explicit validation dataset splits (e.g., percentages or counts for training, validation, and testing).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using PPO and Adam optimizer but does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes Adam optimizer is used with a learning rate 0.0005. PPO clipping is 0.2, the discount factor is 0.995, and the generalized advantage estimate parameter is 0.95. 50,000 samples from 50 parallel rollouts are collected for one batch with mini-batches composed of 2,000 samples for PPO training. Termination generation Tt is set to 1000 for both tasks. We generally train about 100 epochs to guide agents in basic skills. During the training, we set maximum epochs to 1000 for run-to-goal and 2000 for sumo.