PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions
Authors: Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Jay Shi, Shuyang Cheng, Dragomir Anguelov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that the optimal choice within the Poly Loss is indeed dependent on the task and dataset. Simply by introducing one extra hyperparameter and adding one line of code, our Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D image classification, instance segmentation, object detection, and 3D object detection tasks, sometimes by a large margin. |
| Researcher Affiliation | Industry | Zhaoqi Leng1, Mingxing Tan1, Chenxi Liu1, Ekin Dogus Cubuk2, Xiaojie Shi2, Shuyang Cheng1,Dragomir Anguelov1 1Waymo LLC 2Google LLC |
| Pseudocode | Yes | Example code for LCE Poly-1 with softmax activation is shown below. def poly1_cross_entropy(logits, labels, epsilon): # epsilon >=-1. # pt, CE, and Poly1 have shape [batch]. pt = tf.reduce_sum(labels * tf.nn.softmax(logits), axis=-1) CE = tf.nn.softmax_cross_entropy_with_logits(labels, logits) Poly1 = CE + epsilon * (1 - pt) return Poly1 |
| Open Source Code | Yes | Our experiments are based on public datasets and open source code repositories, shown in footnote 3-6. [...] Code at https://github.com/tensorflow/tpu/tree/master/models/official/ [...] Code at https://github.com/google/automl/tree/master/efficientnetv2 [...] Code at https://github.com/tensorflow/lingvo/tree/master/lingvo/tasks/car |
| Open Datasets | Yes | On Image Net (Deng et al., 2009), our Poly Loss improves both pretraining and finetuning for the recent Efficient Net V2 (Tan & Le, 2021); on COCO (Lin et al., 2014), Poly Loss improves both 2D detection and segmentation AR for Mask-RCNN (He et al., 2017); on Waymo Open Dataset (WOD) (Sun et al., 2020), Poly Loss improves 3D detection AP for the widely used Point Pillars (Lang et al., 2019) and the very recent Range Sparse Net (RSN) (Sun et al., 2021). |
| Dataset Splits | Yes | We reserve 25,000 images from the training set as minival to search the optimal ϵ1. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware specifications such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions using TensorFlow, but does not specify exact version numbers for TensorFlow or other software libraries/dependencies. |
| Experiment Setup | Yes | We use Res Net-50 (He et al., 2016) and its training hyperparameters without modification. [...] For the following experiments, we adopt the default training hyperparameters in the public repositories without any tuning. [...] We set ϵ1 = 2 for both. [...] In training Mask R-CNN, we use the training schedule optimized for cross-entropy loss, and replace the cross-entropy loss with LPoly-1 = log(Pt) + ϵ1(1 Pt) for the classification loss Lcls, where ϵ1 ∈ {−1.0, −0.8, −0.6, −0.4, −0.2, 0, 0.5, 1.0}. |