Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information
Authors: Kaifan Zhang, Lihuo He, Xin Jiang, Wen Lu, Di Wang, Xinbo Gao
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate that Cognition Capturer outperforms state-of-the-art methods both qualitatively and quantitatively. |
| Researcher Affiliation | Academia | Kaifan Zhang1, Lihuo He1*, Xin Jiang1, Wen Lu1, Di Wang2, Xinbo Gao1, 3 1School of Electronic Engineering, Xidian University, Xi an, China 2School of Computer Science and Technology, Xidian University, Xi an, China 3Chongqing University of Posts and Telecommunications, Chongqing, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology in narrative text and using a block diagram (Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Xiao Zhang YES/Cognition Capturer |
| Open Datasets | Yes | We utilized Thing-EEG Dataset for our experiments. The Thing-EEG dataset (Gifford et al. 2022) contains EEG data collected from 10 subjects under an RSVP paradigm. |
| Dataset Splits | Yes | The training set comprises 1654 concepts, each associated with 10 images presented four times, resulting in a total of 66,160 EEG recordings. The test set includes 200 unique concepts, each represented by a single image repeated 80 times, totaling 16,000 EEG recordings. |
| Hardware Specification | Yes | The proposed method is conducted on a single Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions several models and tools such as 'Open CLIP Vi T-H/14', 'BLIP2', 'Depth Anything', 'SDXL-Turbo', and 'IP-Adapters' but does not specify their version numbers or the version numbers of underlying programming languages or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | For the training of the modality expert encoder phase, we used the Adam W optimizer with a learning rate of 0.0003, a batch size of 1024, and trained for 20 epochs... During the training of the diffusion prior, we used a batch size of 512, trained for 100 epochs, and set the number of inference steps to 50. The guidance scale was set to 7.5... We set the inference steps for SDXL-Turbo to 5. When configuring the IP-Adapter, for the image modality, we used the full IP-Adapter with the scale set to 1. For the text and depth modalities, we set the scale of their respective IP-Adapter s Layout block and Style block to 0... |