UDON: Universal Dynamic Online distillatioN for generic image representations
Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, Andre Araujo, Ondrej Chum
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With comprehensive experiments, we validate each component of UDON, and showcase significant improvements over the state of the art in the recent Un ED benchmark. |
| Researcher Affiliation | Collaboration | 1VRG, FEE, Czech Technical University in Prague 2Google Deep Mind |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes methods in text and uses block diagrams (e.g., Figure 2). |
| Open Source Code | Yes | Code: https://github.com/nikosips/UDON. |
| Open Datasets | Yes | The proposed method is evaluated on the recent Universal Embeddings Dataset (Un ED) [45], the largest dataset for multi-domain fine-grained retrieval. |
| Dataset Splits | Yes | We follow the train-validation-test splits and the evaluation protocol defined in [45], a brief review follows. |
| Hardware Specification | Yes | Experiments are executed on Google Cloud TPU v4s [16]. |
| Software Dependencies | No | Our implementation is based on the Scenic framework [7], a library based on Jax [5]/Flax [13]. It names specific frameworks but lacks explicit version numbers for these components. |
| Experiment Setup | Yes | The newly introduced hyperparameters are tuned based on performance on the validation set of Un ED. For the KL divergence loss (3), the value of temperature T is set to T = 0.1 (a discussion regarding this choice can be found in the Appendix); the teacher embeddings have dimensionality of Dt = 256; the four loss components contribute equally to the total loss Ltotal (no weights need to be tuned). We set the universal student embedding dimensionality to d = 64 for direct comparability against previous work. The batch size is set as B = 128. The hyperparameter S for the number of steps, after which the dynamic sampler is updated, is set to 1000. |