Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML
Authors: Patara Trirat, Wonyong Jeong, Sung Ju Hwang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on seven downstream tasks using fourteen datasets show that Auto ML-Agent achieves a higher success rate in automating the full Auto ML process, yielding systems with good performance throughout the diverse domains. ... We demonstrate the superiority of the proposed Auto MLAgent framework through extensive experiments on seven downstream tasks using fourteen datasets. |
| Researcher Affiliation | Collaboration | Patara Trirat 1 Wonyong Jeong 1 Sung Ju Hwang 1 2 1Deep Auto.ai 2KAIST, Seoul, South Korea. Correspondence to: Sung Ju Hwang <EMAIL>. Deep Auto.ai appears to be an industry entity (indicated by email domain deepauto.ai), while KAIST is an academic institution (Korea Advanced Institute of Science and Technology). Since authors are affiliated with both types of institutions, it is classified as a collaboration. |
| Pseudocode | Yes | As depicted in Figure 2 and Algorithm 1. ... Algorithm 1 Overall Procedure of Auto ML-Agent |
| Open Source Code | Yes | We have made the source code available at https://github.com/deepauto-ai/automl-agent. |
| Open Datasets | Yes | Extensive experiments on seven downstream tasks using fourteen datasets... These datasets are chosen from different sources. ... Butterfly Image (Butterfly). ... The dataset is accessible at https://www.kaggle.com/datasets/phucthaiv02/butterfly-image-classification. ... Shopee-IET (Shopee). ... The dataset is available at https://www.kaggle.com/competitions/demo-shopee-iet-competition/data. ... Textual Entailment (Entail). ... We use the dataset provided by Guo et al. (2024a). ... Higher Education Students Performance (Student). ... This dataset can be found at https://archive.ics.uci.edu/dataset/856/higher+education+students+performance+evaluation. ... Cora and Citeseer. ... We use the version provided by Fey & Lenssen (2019). |
| Dataset Splits | Yes | Dataset Splitting: Split the dataset into training, validation, and testing sets (e.g., 80% training and 20% validation). ... In the (3) execution stage, the Data (Ad) and Model (Am) Agents decompose these plans and execute them via plan decomposition (PD) and prompting-based plan execution (Figure 2(b) and Line 13 16)... ... # TODO: Step 2. Create a train-valid-test split of the data by splitting the dataset into train_loader, valid_loader, and test_loader. # Here, the train_loader contains 70% of the dataset , the valid_loader contains 20% of the dataset , and the test_loader contains 10% of the dataset . |
| Hardware Specification | Yes | All experiments are conducted on an Ubuntu 22.04 LTS server equipped with eight NVIDIA A100 GPUs (CUDA 12.4) and Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz. |
| Software Dependencies | Yes | Except for the Ap that is implemented with Mixtral-8x7B (Mixtral-8x7B-Instruct-v0.1) (Jiang et al., 2024), we use GPT-4 (gpt-4o-2024-05-13) as the backbone model for all agents... All experiments are conducted on an Ubuntu 22.04 LTS server equipped with eight NVIDIA A100 GPUs (CUDA 12.4)... |
| Experiment Setup | Yes | For RAP (3.4), we set the number of plans P = 3 and the number of candidate models k = 3. ... For the constraint-free setting, a method can get a score of 0.5 (pass modeling) or 1.0 (pass deployment). For the constraint-aware setting, a method can get a score of 0.25 (pass modeling), 0.5 (pass deployment), 0.75 (partially pass the constraints), or 1.0 (pass all cases). ... We report the average scores from five independent runs for all evaluation metrics in Figure 4. ... optimizer = optim.Adam(model.parameters(), lr=0.00001)... num_epochs = 100 ... early_stop_patience = 10 |