X-model: Improving Data Efficiency in Deep Learning with A Minimax Model
Authors: Ximei Wang, Xinyang Chen, Jianmin Wang, Mingsheng Long
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify the superiority of the χ-Model among various tasks, from a single-value prediction task of age estimation to a dense-value prediction task of keypoint localization, a 2D synthetic and a 3D realistic dataset, as well as a multi-category object recognition task. |
| Researcher Affiliation | Academia | Ximei Wang, Xinyang Chen, Jianmin Wang, Mingsheng Long (B) School of Software, BNRist, Tsinghua University, China wxm17@mails.tsinghua.edu.cn, chenxinyang95@gmail.com jimwang@tsinghua.edu.cn, mingsheng@tsinghua.edu.cn, |
| Pseudocode | No | No explicitly labeled 'Pseudocode' or 'Algorithm' blocks were found. |
| Open Source Code | No | Code will be made available at https://github.com. |
| Open Datasets | Yes | d Sprites (Higgins et al., 2017) is a standard 2D synthetic dataset...; MPI3D (Gondal et al., 2019) is a simulation-to-real dataset for 3D object.; IMDB-WIKI(Rasmus Rothe, 2016) is a face dataset with age and gender labels...; Hand-3D-Studio (H3D) (Zhao et al., 2020) is a real-world dataset...; We adopt the most difficult CIFAR-100 dataset (Krizhevsky, 2009)... |
| Dataset Splits | Yes | On IMDB-WIKI, following the data pre-process method of a recent work (Yang et al., 2021), we also filter out unqualified images, and manually construct balanced validation and test set over the supported ages. After splitting, there are 191.5K images for training and 11.0K images for validation and testing. |
| Hardware Specification | Yes | We use Py Torch1 with Titan V to implement our methods. |
| Software Dependencies | No | We use Py Torch1 with Titan V to implement our methods. No specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | The tradeoff hyperparameter η is set as 0.1 for all tasks unless specified. The learning rates of the heads are set as 10 times to those of the backbone layers, following the common fine-tuning principle (Yosinski et al., 2014). We adopt the mini-batch SGD with momentum of 0.95. |