Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning topology-preserving data representations
Authors: Ilya Trofimov, Daniil Cherniavskii, Eduard Tulchinskii, Nikita Balabin, Evgeny Burnaev, Serguei Barannikov
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By doing computational experiments, we show that the proposed RTD-AE outperforms state-of-the-art methods of dimensionality reduction and the vanilla autoencoder in terms of preserving the global structure and topology of a data manifold; we measure it by the linear correlation, the triplet distance ranking accuracy, Wasserstein distance between persistence barcodes, and RTD. |
| Researcher Affiliation | Collaboration | 1Skolkovo Institute of Science and Technology; 2CNRS, Universit e Paris Cit e; 3Huawei Noah s Ark lab; 4Artificial Intelligence Research Institute (AIRI) |
| Pseudocode | No | The paper describes an "Algorithm" in section 4.2 but it is presented as a textual description rather than a structured pseudocode block or a clearly labeled algorithm figure. |
| Open Source Code | Yes | We release the RTD-AE source code. 1 [footnote refers to github.com/danchern97/RTD AE] |
| Open Datasets | Yes | The complete description of all the used datasets can be found in Appendix L. [Appendix L lists and cites: MNIST (Le Cun et al., 1998), F-MNIST (Xiao et al., 2017), COIL-20 (Nene et al., 1996), sc RNA mice (Yuan et al., 2017), sc RNA melanoma (Tirosh et al., 2016)] |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | Yes | For experiments we used NVIDIA TITAN RTX. |
| Software Dependencies | No | The paper mentions using a "modified version of Ripser++ software (Zhang et al., 2020)", but it does not specify a version number for Ripser++ or any other software dependencies. |
| Experiment Setup | Yes | In the experiments with projecting to 3D-space we trained model for 100 epochs using Adam optimizer. We initially trained autoencoder for 10 epochs with only the reconstruction loss and learning rate 1e-4, then continued with RTD. Epochs 11-30 were trained with learning rate 1e-2, epochs 31-50 with learning rate 1e-3 and for epochs all after learning rate 1e-4 was used. Batch size was 80. For 2D and high-dimensional projections, we used fully-connected autoencoders with hyperparameters specified in the Table 7. [Table 7 provides specific values for Batch size, LR, Hidden dim, # layers, Epochs, RTD epoch for different datasets.] |