Coarse-To-Fine Tensor Trains for Compact Visual Representations
Authors: Sebastian Bugge Loeschcke, Dan Wang, Christian Munklinde Leth-Espensen, Serge Belongie, Michael Kastoryano, Sagie Benaim
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our representation along three axes: (1). compression, (2). denoising capability, and (3). image completion capability. To assess these axes, we consider the tasks of image fitting, 3D fitting, and novel view synthesis, where our method shows an improved performance compared to state-of-the-art tensor-based methods. |
| Researcher Affiliation | Academia | 1University of Copenhagen 2IT University of Copenhagen 3Aarhus University 4Hebrew University of Jerusalem. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/sebulo/Pu TT |
| Open Datasets | Yes | Datasets For 2D, we utilize two high-resolution images: Girl With a Pearl Earring (Vermeer, 1665) photograph, and Tokyo gigapixel (Dobson, 2018), which are centercropped to a 16k resolution. We also include three 4k images: Marseille (Studio, 2023), Pluto (NASA/Johns Hopkins University, 2023), and Westerlund (NASA and ESA, 2023) for noise and missing data experiments. For 3D, we utilize the Flower data (of Zurich, 2023), and John Hopkins Turbulence dataset (Li et al., 2008), which consists of a set of 3D voxel grids at 10243 resolution, providing a diverse range of high-resolution structures by downsampling to different resolutions. For novel view synthesis, we employ the Blender (Mildenhall et al., 2020) and NSVF (Liu et al., 2020a) datasets, comprising eight synthetic 3D scenes at 800 800 resolution, alongside the Tanks Temples (Knapitsch et al., 2017) dataset (1920 1080). |
| Dataset Splits | No | The paper discusses training on subsets of data for certain experiments and testing on 'near' and 'far' views, but does not specify standard training, validation, and test dataset splits with percentages or counts for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | Both models, Pu TT and Tenso RF, demonstrated similar running times, ranging between 4 and 5 hours for both 7MB and 12MB on an NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The Pu TT implementation starts by initializing the QTT in Py Torch (Paszke et al., 2019), as outlined in Sec. D.3. and In our work, TT-SVD, implemented via the TNTorch framework (Usvyatsov et al., 2022). While PyTorch and TNTorch are mentioned, explicit version numbers (e.g., PyTorch 1.9) are not provided. |
| Experiment Setup | Yes | Pu TT s upsampling steps and iteration counts are tailored to the desired resolution for each task. For instance, to attain a 2D resolution of 10242, the model undergoes three upsampling steps from an initial 1282 resolution at the 64th, 128th, and 256th iterations, culminating in 1024 iterations in total. A resolution of 20482 involves four upsampling steps and 2048 iterations, starting from 1282. and Striking a balance between efficiency and accuracy, we opted for a batch size of 5122 with a base learning rate of 5 10 3. For the high-resolution case of 10243, we had to adjust the batch size to 1282 since Tucker and CP methods due to computational limits. Instead, we doubled the number of iterations. and In our training setup, we implement an exponential learning rate decay strategy with a decay factor of α = 0.1. |