Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues

Authors: Jiaxuan Chen, Yu Qi, Gang Pan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments were carried out with a benchmark dataset in comparison with existing approaches.
Researcher Affiliation Academia 1State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China. 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China. 3MOE Frontier Science Center for Brain Science and Brain-Machine Integration, Zhejiang University, Hangzhou, China.
Pseudocode No The paper describes the proposed framework and its components using text and equations (e.g., Eq. 1-13) but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes We experimented with a popular publicly available f MRI dataset, which is called Generic Object Decoding (GOD) dataset (Horikawa & Kamitani, 2017).
Dataset Splits No The paper states: 'For each subject, training set consists 1200 f MRI-image pairs, and the testing made up of 50 f MRI recordings with corresponding images.' It does not explicitly define a separate validation dataset split.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'Adam solver (Kingma & Ba, 2014)' but does not specify version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes The parameter setting of VQ-f MRI for all experiments is summarized as follows. Enocders of VQ-VAE: 2 convolutional layers (stride 2, kernel 4 4, and padding 1), followed by two residual blocks; Deocders of VQ-VAE: two residual blocks, followed by 3 transposed convolutions (stride 2, kernel 4 4, and padding 1); Codebooks: ZL R8 32 (image y R64 64 3), and Z R8 128 (image y R128 128 3). We implemented the image classifier, inpainting, and SR modules using the UNet with 2 downsampling and 2 upsampling layers (stride 2, kernel 4 4, and padding 1). Adam solver (Kingma & Ba, 2014) is employed to optimize the parameters with a learning rate of 2e-4.