SPDER: Semiperiodic Damping-Enabled Object Representation
Authors: Kathan Shah, Chawin Sitawarin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results indicate that SPDERs speed up training by 10 and converge to losses 10,000 100,000 lower than that of the state-of-the-art for image representation. SPDER is also state-of-the-art in audio representation. The superior representation capability allows SPDER to also excel on multiple downstream tasks such as image super-resolution and video frame interpolation. |
| Researcher Affiliation | Academia | Kathan Shah UC Berkeley kathan@berkeley.edu Chawin Sitawarin UC Berkeley chawins@berkeley.edu |
| Pseudocode | No | The paper describes the architecture and mathematical functions used, but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | Yes | See code at https://github.com/katop1234/SPDER. |
| Open Datasets | Yes | For these experiments, we sampled images from DIV2K, a high-quality super-resolution dataset. Each image was resized to a resolution of 256x256 and fed into an untrained network, which would then train on it for a fixed number of steps. ... We utilized three datasets DIV2K, Flickr2K, and Flickr-Faces-HQ. ... Each training sample was cropped to the first 7 seconds of clips from ESC-50, a labeled collection of 2000 environmental audio recordings, and then trained on for 1000 steps. ... The first video we trained on was from skvideo.datasets.bigbuckbunny(), a standard video used in computer vision research. ... The second video was from skvideo.datasets.bikes(). |
| Dataset Splits | No | The paper details various training and testing procedures on different datasets. While it implicitly evaluates performance through metrics like loss and PSNR over training steps, it does not explicitly describe the use of a separate 'validation' dataset split for hyperparameter tuning or early stopping in a formal manner across all experiments. |
| Hardware Specification | Yes | We used two Ge Force GTX 1080 Ti s with 12 GB of memory each. We note that all SPDER experiments can be run on personal laptops and require no tailored hardware. |
| Software Dependencies | No | The paper mentions using PyTorch's built-in torchvision.transforms.Resize module and discusses computational aspects related to GPU, but it does not specify exact version numbers for PyTorch or any other core software libraries used for reproducibility. |
| Experiment Setup | Yes | By default, we use a 5-layer network with 256 neurons in each layer and a learning rate of 1 10 4. Input and output values are scaled to be in [ 1, 1]. As we wish to completely overfit to the training data, we use full batch gradient descent (i.e., we do not want the noise from SGD) on each object. We discovered that clamping the input of sin(x) p |x| to a tiny value (less than 1 10 30) avoided division by zero errors on the GPU while maintaining overall functionality. For any resizing operations, we use Py Torch s built-in torchvision.transforms.Resize module which uses bilinear interpolation under the hood. |