Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras
Authors: Bin Fan, Jiaoyang Yin, Yuchao Dai, Chao Xu, Tiejun Huang, Boxin Shi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity. |
| Researcher Affiliation | Academia | Bin Fan1 Jiaoyang Yin2,3 Yuchao Dai4 Chao Xu1 Tiejun Huang2,3 Boxin Shi2,3 1Nat l Key Lab of General AI, School of Intelligence Science and Technology, Peking University 2State Key Lab of Multimedia Info. Processing, School of Computer Science, Peking University 3Nat l Eng. Research Ctr. of Visual Technology, School of Computer Science, Peking University 4School of Electronics and Information, Northwestern Polytechnical University |
| Pseudocode | No | The paper describes the network architecture and components in detail using prose and diagrams (e.g., Figure 3), but it does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | The code is available at https://github.com/Git CVfb/STIR. |
| Open Datasets | Yes | We adopt the recently released SREDS dataset [57], which is synthesized based on the REDS dataset [38], for network training. |
| Dataset Splits | No | The paper mentions training and testing scenes from the SREDS dataset ('240 training scenes and 30 testing scenes') but does not explicitly specify a separate validation split or how validation was performed. |
| Hardware Specification | Yes | All models are trained and tested on a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer [28]' but does not provide specific version numbers for software components like Python, PyTorch, or CUDA libraries, which are necessary for full reproducibility of software dependencies. |
| Experiment Setup | Yes | Our model is trained using the Adam optimizer [28] for 150 epochs with a batch size of 8. The initial learning rate is 0.0001 and decays by a factor of 0.7 every 50 epochs. The temporal length of the input spike stream is 60, i.e., N = 20. The number of pyramid levels is set to 5, i.e., L = 5. In our HSER module, we construct a 5-channel TFP-based explicit representation... as well as an 11-channel Res Net-based implicit representation... Thus, the number of feature channels is 16, 24, 32, 64, and 96, respectively. Besides, 3 groups of motion fields are estimated at the bottom-level pyramid, i.e., G = 3. Spikes and ground truth images are randomly flipped vertically as well as rotated 90 , 180 , or 270 during training. |