MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Authors: Tsz-Him Cheung, Dit-Yan Yeung

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments, we demonstrate the effectiveness of MODALS on multiple datasets for text, tabular, time-series and image modalities.
Researcher Affiliation Academia Tsz-Him Cheung & Dit-Yan Yeung Department of Computer Science and Engineering The Hong Kong University of Science and Technology {thcheungae,dyyeung}@cse.ust.hk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks that are clearly labeled.
Open Source Code Yes Code is available at https://github.com/jamestszhim/modals.
Open Datasets Yes We test MODALS on the SST2 (Socher et al., 2013) and TREC6 (Li & Roth, 2002) datasets... We also perform an experiment with multiple tabular datasets from the UCI repository (Dua & Graff, 2017), including the Iris, Breast Cancer, Arcene (Guyon et al., 2005), Abalone, and HTRU2 (Lyon et al., 2016) datasets. For time-series data, we use the HAR (Anguita et al., 2013) and Malware (Catak, 2019) datasets.
Dataset Splits Yes In all the experiments, the augmentation policy is searched using 50% of the data as the validation set. For all tabular datasets, we split 20% of the dataset as the test set unless the test set is explicitly provided in the repository.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using the Ray Tune framework but does not specify its version or the versions of any other key software dependencies required for replication.
Experiment Setup Yes In all experiments, we set α = 1, β = 0.03 and search for the metric margin value from {0.5, 1, 2, 4, 8}. The discriminator is trained using the Adam optimizer with learning rate 0.01. The model is trained for 100 epochs using the Adam optimizer with learning rate 0.01 and batch size 100.