Research repository for exploring mechanistic interpretability and safety evaluations in large language models (LLMs), focusing on understanding SAE features and prompt safety benchmarking.
-
Notifications
You must be signed in to change notification settings - Fork 1
Research repository for exploring mechanistic interpretability and safety evaluations in large language models (LLMs), focusing on understanding SAE features and prompt safety benchmarking.
License
kvamsi7/vads-prevalent-safety-llm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Research repository for exploring mechanistic interpretability and safety evaluations in large language models (LLMs), focusing on understanding SAE features and prompt safety benchmarking.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published