reproducibilityindex.ai

Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

Authors: Bin Fan, Jiaoyang Yin, Yuchao Dai, Chao Xu, Tiejun Huang, Boxin Shi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity.
Researcher Affiliation	Academia	Bin Fan1 Jiaoyang Yin2,3 Yuchao Dai4 Chao Xu1 Tiejun Huang2,3 Boxin Shi2,3 1Nat l Key Lab of General AI, School of Intelligence Science and Technology, Peking University 2State Key Lab of Multimedia Info. Processing, School of Computer Science, Peking University 3Nat l Eng. Research Ctr. of Visual Technology, School of Computer Science, Peking University 4School of Electronics and Information, Northwestern Polytechnical University
Pseudocode	No	The paper describes the network architecture and components in detail using prose and diagrams (e.g., Figure 3), but it does not include formal pseudocode blocks or algorithms.
Open Source Code	Yes	The code is available at https://github.com/Git CVfb/STIR.
Open Datasets	Yes	We adopt the recently released SREDS dataset [57], which is synthesized based on the REDS dataset [38], for network training.
Dataset Splits	No	The paper mentions training and testing scenes from the SREDS dataset ('240 training scenes and 30 testing scenes') but does not explicitly specify a separate validation split or how validation was performed.
Hardware Specification	Yes	All models are trained and tested on a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions using the 'Adam optimizer [28]' but does not provide specific version numbers for software components like Python, PyTorch, or CUDA libraries, which are necessary for full reproducibility of software dependencies.
Experiment Setup	Yes	Our model is trained using the Adam optimizer [28] for 150 epochs with a batch size of 8. The initial learning rate is 0.0001 and decays by a factor of 0.7 every 50 epochs. The temporal length of the input spike stream is 60, i.e., N = 20. The number of pyramid levels is set to 5, i.e., L = 5. In our HSER module, we construct a 5-channel TFP-based explicit representation... as well as an 11-channel Res Net-based implicit representation... Thus, the number of feature channels is 16, 24, 32, 64, and 96, respectively. Besides, 3 groups of motion fields are estimated at the bottom-level pyramid, i.e., G = 3. Spikes and ground truth images are randomly flipped vertically as well as rotated 90 , 180 , or 270 during training.