vads-prevalentsafety-llm

Research repository for exploring mechanistic interpretability and safety evaluations in large language models (LLMs), focusing on understanding SAE features and prompt safety benchmarking.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
config		config
data		data
docs		docs
notebooks		notebooks
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vads-prevalentsafety-llm

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

kvamsi7/vads-prevalent-safety-llm

Folders and files

Latest commit

History

Repository files navigation

vads-prevalentsafety-llm

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages