Language-Interfaced Tabular Oversampling via Progressive Imputation and Self-Authentication
Authors: June Yong Yang, Geondo Park, Joowon Kim, Hyeongwon Jang, Eunho Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on a variety of datasets and imbalance ratios reveal that the proposed method successfully generates reliable minority samples to boost the performance of machine learning classifiers, even under heavy imbalance ratios. |
| Researcher Affiliation | Collaboration | June Yong Yang1 , Geondo Park1 , Joowon Kim1, Hyeongwon Jang2, Eunho Yang1,3 KAIST1, Seoul National University2, AITRICS3 |
| Pseudocode | Yes | Algorithm 1 LITO-B |
| Open Source Code | No | The paper states that |
| Open Datasets | Yes | Datasets. We validate our method on six tabular benchmark datasets: Default, Shoppers, Sick, and Diabetes for binary classification, Obesity and Satimage for multi-class classification. [...] We partition the datasets into 80% for training and 20% for the test set following the previous works (Kim et al., 2022). For datasets with relatively small sizes (Diabetes, Sick), we split the dataset into 70% training set and 30% test set. |
| Dataset Splits | No | The paper specifies training and test splits (e.g., "80% for training and 20% for the test set") but does not explicitly mention a separate validation set split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions using the Open AI chatbot API for some experiments, but without hardware specifications. |
| Software Dependencies | Yes | We implement our method with the Py Torch deep learning framework and Huggingface Transformers package (Wolf et al., 2019). [...] For in-context learning oversampling experiments, we use the Open AI chatbot API to access GPT-3.5-turbo-0613. |
| Experiment Setup | Yes | A.2 HYPERPARAMETERS FOR EVALUATING MACHINE LEARNING EFFICIENCY provides details like "Max depth is 32, criterion is gini" for Decision Tree, "number of ensemble estimators is 100, and the learning rate is 1.0" for Ada Boost. A.3 HYPERPARAMETERS USED FOR FINE-TUNING GENERATIVE LANGUAGE MODELS mentions "fine-tune the Distill-GPT-2 model for each data set for 200 or 300 epochs" and "We use the constant 5e-5 learning rate". |