X-model: Improving Data Efficiency in Deep Learning with A Minimax Model

Authors: Ximei Wang, Xinyang Chen, Jianmin Wang, Mingsheng Long

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify the superiority of the χ-Model among various tasks, from a single-value prediction task of age estimation to a dense-value prediction task of keypoint localization, a 2D synthetic and a 3D realistic dataset, as well as a multi-category object recognition task.
Researcher Affiliation Academia Ximei Wang, Xinyang Chen, Jianmin Wang, Mingsheng Long (B) School of Software, BNRist, Tsinghua University, China wxm17@mails.tsinghua.edu.cn, chenxinyang95@gmail.com jimwang@tsinghua.edu.cn, mingsheng@tsinghua.edu.cn,
Pseudocode No No explicitly labeled 'Pseudocode' or 'Algorithm' blocks were found.
Open Source Code No Code will be made available at https://github.com.
Open Datasets Yes d Sprites (Higgins et al., 2017) is a standard 2D synthetic dataset...; MPI3D (Gondal et al., 2019) is a simulation-to-real dataset for 3D object.; IMDB-WIKI(Rasmus Rothe, 2016) is a face dataset with age and gender labels...; Hand-3D-Studio (H3D) (Zhao et al., 2020) is a real-world dataset...; We adopt the most difficult CIFAR-100 dataset (Krizhevsky, 2009)...
Dataset Splits Yes On IMDB-WIKI, following the data pre-process method of a recent work (Yang et al., 2021), we also filter out unqualified images, and manually construct balanced validation and test set over the supported ages. After splitting, there are 191.5K images for training and 11.0K images for validation and testing.
Hardware Specification Yes We use Py Torch1 with Titan V to implement our methods.
Software Dependencies No We use Py Torch1 with Titan V to implement our methods. No specific version number for PyTorch or other software dependencies is provided.
Experiment Setup Yes The tradeoff hyperparameter η is set as 0.1 for all tasks unless specified. The learning rates of the heads are set as 10 times to those of the backbone layers, following the common fine-tuning principle (Yosinski et al., 2014). We adopt the mini-batch SGD with momentum of 0.95.