GitHub - justin-xzliu/GLIM: Official PyTorch implementation of research papaer "Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation".

GLIM: Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
_{Official PyTorch Implementation}

This repository contains:

⚡ A modular implementation of GLIM, organized with PyTorch Lightning.
✂️ Complete data preprocessing notebooks.
🚆 Simple training and test scripts.
🧭 Semantic classification notebooks that align with the results shown in the paper.
🗒️ All text samples generated with GLIM, its noise-input test and prompt-free test.

TL;DR

Are EEG-to-text models working? 🤔 ⟶ YES ! 👇👇👇

Abstract

Pretrained generative models have opened new frontiers in brain decoding by enabling the synthesis of realistic texts and images from non-invasive brain recordings. However, the reliability of such outputs remains questionable—whether they truly reflect semantic activation in the brain, or are merely hallucinated by the powerful generative models. In this paper, we focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse. Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings rather than previously verbatim reconstruction of stimulus texts. To this end, we propose the Generative Language Inspection Model (GLIM), which emphasizes learning informative and interpretable EEG representations to improve semantic grounding under heterogeneous and small-scale data conditions. Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences without teacher forcing. Moreover, it supports more robust evaluation beyond text similarity, through EEG-text retrieval and zero-shot semantic classification across sentiment categories, relation types, and corpus topics. Together, our architecture and evaluation protocols lay the foundation for reliable and scalable benchmarking in generative brain decoding.

Model architecture

Representative generation examples

You can find full generated samples in this interactive report or in the results/ directory.

Setup

Run conda env create -f environment.yml to create the environment.

Download the ZuCo dataset, including versions 1.0 and 2.0.

💡 You can just download part of the files and organize them as the following structure.

data/
├── raw_data/
│   ├── 🌐 ZuCo1/                   ## see https://osf.io/q3zws/files/osfstorage
│   │   ├── ☑️ task_materials/      ## download texts and lables
│   │   ├── task1- SR/
│   │   │   └── ✅ Matlab files/    ## download sentence-level EEG segments
│   │   ├── task2 - NR/
│   │   │   └── ✅ Matlab files/
│   │   └── task3 - TSR/
│   │       └── ✅ Matlab files/
│   └── 🌐 ZuCo2/                   ## see https://osf.io/2urht/files/osfstorage
│       ├── ☑️ task_materials/
│       ├── task1 - NR/
│       │   └── ✅ Matlab files/
│       └── task2 - TSR/
│           └── ✅ Matlab files/

Data preprocessing

You can either

run all four preprocessing notebooks step by step; or just
start from STEP3 with this label table to skip generating text variants.

Reproduce our results

Download the model checkpoint and put it in checkpoints/.
Run test.py to generate sentences and compute overall metrics, with one single GPU.
Run each predict_xxx.ipynb to reproduce the classification results (with both CLIP-like and LLM-assisted approaches).

Train from scratch

Run train.py with default parameters (except for those assosiated with your devices and directories).

BibTeX

@article{liu2025glim,
  title={Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation},
  author={Xiaozhao Liu and Dinggang Shen and Xihui Liu},
  year={2025},
  journal={arXiv preprint arXiv:2505.17099},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
figs		figs
model		model
results		results
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml
predict_corpus.ipynb		predict_corpus.ipynb
predict_relation.ipynb		predict_relation.ipynb
predict_sentiment.ipynb		predict_sentiment.ipynb
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GLIM: Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
_{Official PyTorch Implementation}

TL;DR

Abstract

Model architecture

Representative generation examples

Setup

Data preprocessing

Reproduce our results

Train from scratch

BibTeX

License

About

Uh oh!

Languages

License

justin-xzliu/GLIM

Folders and files

Latest commit

History

Repository files navigation

GLIM: Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text GenerationOfficial PyTorch Implementation

TL;DR

Abstract

Model architecture

Representative generation examples

Setup

Data preprocessing

Reproduce our results

Train from scratch

BibTeX

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

GLIM: Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation
_{Official PyTorch Implementation}