Learning topology-preserving data representations

Authors: Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, Serguei Barannikov

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By doing computational experiments, we show that the proposed RTD-AE outperforms state-of-the-art methods of dimensionality reduction and the vanilla autoencoder in terms of preserving the global structure and topology of a data manifold; we measure it by the linear correlation, the triplet distance ranking accuracy, Wasserstein distance between persistence barcodes, and RTD.
Researcher Affiliation Collaboration 1Skolkovo Institute of Science and Technology; 2CNRS, Universit e Paris Cit e; 3Huawei Noah s Ark lab; 4Artificial Intelligence Research Institute (AIRI)
Pseudocode No The paper describes an "Algorithm" in section 4.2 but it is presented as a textual description rather than a structured pseudocode block or a clearly labeled algorithm figure.
Open Source Code Yes We release the RTD-AE source code. 1 [footnote refers to github.com/danchern97/RTD AE]
Open Datasets Yes The complete description of all the used datasets can be found in Appendix L. [Appendix L lists and cites: MNIST (Le Cun et al., 1998), F-MNIST (Xiao et al., 2017), COIL-20 (Nene et al., 1996), sc RNA mice (Yuan et al., 2017), sc RNA melanoma (Tirosh et al., 2016)]
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification Yes For experiments we used NVIDIA TITAN RTX.
Software Dependencies No The paper mentions using a "modified version of Ripser++ software (Zhang et al., 2020)", but it does not specify a version number for Ripser++ or any other software dependencies.
Experiment Setup Yes In the experiments with projecting to 3D-space we trained model for 100 epochs using Adam optimizer. We initially trained autoencoder for 10 epochs with only the reconstruction loss and learning rate 1e-4, then continued with RTD. Epochs 11-30 were trained with learning rate 1e-2, epochs 31-50 with learning rate 1e-3 and for epochs all after learning rate 1e-4 was used. Batch size was 80. For 2D and high-dimensional projections, we used fully-connected autoencoders with hyperparameters specified in the Table 7. [Table 7 provides specific values for Batch size, LR, Hidden dim, # layers, Epochs, RTD epoch for different datasets.]