CompetEvo: Towards Morphological Evolution from Competition
Authors: Kangyao Huang, Di Guo, Xinyu Zhang, Xiangyang Ji, Huaping Liu
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results reveal that our method enables agents to evolve a more suitable design and strategy for fighting compared to fixed-morph agents, allowing them to obtain advantages in combat scenarios. Moreover, we demonstrate the amazing and impressive behaviors that emerge when confrontations are conducted under asymmetrical morphs. A series of cross-antagonism experiments is conducted to validate the significant role played by morphological evolution during confrontations in enhancing an agent s ability to deal with adversaries. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University 2School of Artificial Intelligence, Beijing University of Posts and Telecommunications 3School of Vehicle and Mobility, Tsinghua University 4Department of Automation, Tsinghua University |
| Pseudocode | Yes | Algorithm 1 Confrontation algorithm for co-evolving agents. |
| Open Source Code | Yes | Additionally, our environment files and related codes can be accessed from https://github.com/KJaebye/competevo. |
| Open Datasets | No | The paper describes training agents in simulated environments ('run-to-goal' and 'sumo') but does not mention the use or public availability of a pre-collected dataset for training. |
| Dataset Splits | No | The paper describes training in simulation environments and collecting samples during training, but it does not specify any explicit validation dataset splits (e.g., percentages or counts for training, validation, and testing). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using PPO and Adam optimizer but does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Adam optimizer is used with a learning rate 0.0005. PPO clipping is 0.2, the discount factor is 0.995, and the generalized advantage estimate parameter is 0.95. 50,000 samples from 50 parallel rollouts are collected for one batch with mini-batches composed of 2,000 samples for PPO training. Termination generation Tt is set to 1000 for both tasks. We generally train about 100 epochs to guide agents in basic skills. During the training, we set maximum epochs to 1000 for run-to-goal and 2000 for sumo. |