This repository contains the official codebase for our paper:
"Layerwise Recall and the Geometry of Interwoven Knowledge in LLMs"
We investigate how large language models (LLMs) encode structured scientific knowledge using chemical elements as a case study. Our key findings include:
- Discovery of a 3D spiral structure in LLM activations, aligned with the periodic table.
- Intermediate layers encode continuous, overlapping attributes suitable for indirect recall.
- Deeper layers sharpen categorical boundaries and integrate linguistic context.
- LLMs organize facts as geometry-aware manifolds, not just isolated tokens.
Each folder corresponds to a section or concept in the paper:
Pre/
— Preprocessing scripts: prompt creation, activation extraction.Geometry/
— Code for geometric analyses, such as spiral detection.Direct_recall/
— Linear probing for direct factual recall.Indirect_recall/
— Experiments on retrieving unmentioned or related facts.Appendix/
— Extra analysis, visualizations, and ablation results.Results/
— Saved figures, metrics, and outputs.periodic_table_dataset.csv
— Structured dataset of 50 elements and attributes.
-
Clone the repository and enter the project directory.
-
Set your HuggingFace API token in
config.json
:{ "HF_TOKEN": "your_huggingface_token" }
-
Install dependencies:
pip install -r requirements.txt