insideLLMs

Python library for probing the inner workings of large language models. Systematically test LLMs' zero-shot ability at unseen logic problems, propensity for bias, vulnerabilities to particular attacks, factual accuracy, and more.

Features

Multiple Model Support: OpenAI, HuggingFace, Anthropic Claude, and more
Diverse Probes: Test LLMs on logic, bias, attack vulnerabilities, and factual accuracy
Visualization Tools: Visualize probe results with matplotlib
Benchmarking: Compare multiple models on the same tasks
Configurable: Run experiments with YAML/JSON configuration
NLP Utilities: Text processing, tokenization, feature extraction, NER, and more

Installation

To install insideLLMs from a local clone of this repository, navigate to the root directory of the clone and run:

# Install the base package
pip install .

# To include NLP utilities (requires NLTK, spaCy, scikit-learn, gensim):
pip install .[nlp]

# To include visualization tools (requires matplotlib, pandas, seaborn):
pip install .[visualization]

# To include all optional dependencies:
pip install .[nlp,visualization]

See pyproject.toml for the list of dependencies. For development, see the "Development" section below.

Quick Start

from insideLLMs.models import OpenAIModel
from insideLLMs.probes import LogicProbe

# Initialize a model
model = OpenAIModel(model_name="gpt-3.5-turbo")

# Create a probe
probe = LogicProbe()

# Run the probe
result = probe.run(model, "If all A are B, and all B are C, then are all A also C?")
print(result)

Available Models

OpenAIModel: For OpenAI's GPT models
HuggingFaceModel: For HuggingFace Transformers models
AnthropicModel: For Anthropic's Claude models
DummyModel: For testing purposes

Available Probes

LogicProbe: Tests LLMs' ability to solve logic problems
BiasProbe: Tests LLMs' propensity for bias
AttackProbe: Tests LLMs' vulnerabilities to attacks
FactualityProbe: Tests LLMs' factual accuracy

Visualization and Benchmarking

from insideLLMs.models import OpenAIModel, AnthropicModel
from insideLLMs.probes import FactualityProbe
from insideLLMs.benchmark import ModelBenchmark
from insideLLMs.visualization import plot_factuality_results

# Create models and probe
models = [OpenAIModel(), AnthropicModel()]
probe = FactualityProbe()

# Run benchmark
benchmark = ModelBenchmark(models, probe)
results = benchmark.run(factual_questions)

# Compare models
comparison = benchmark.compare_models()
print(f"Fastest model: {comparison['rankings']['total_time'][0]}")

# Visualize results
plot_factuality_results(results["models"][0]["results"])

NLP Utilities

The library includes a comprehensive set of NLP utilities for text processing and analysis:

Text Cleaning and Normalization

from insideLLMs.nlp_utils import clean_text, remove_html_tags, remove_urls

# Clean text with multiple options
clean_text = clean_text(text, remove_html=True, remove_url=True, lowercase=True)

Tokenization and Segmentation

from insideLLMs.nlp_utils import simple_tokenize, nltk_tokenize, segment_sentences

# Get tokens and sentences
tokens = simple_tokenize(text)  # No dependencies
tokens = nltk_tokenize(text)    # Requires NLTK
sentences = segment_sentences(text)

Feature Extraction

from insideLLMs.nlp_utils import create_bow, create_tfidf, extract_pos_tags

# Create document representations
bow_matrix, feature_names = create_bow(texts)  # Bag of Words
tfidf_matrix, feature_names = create_tfidf(texts)  # TF-IDF

Text Statistics and Metrics

from insideLLMs.nlp_utils import count_words, calculate_lexical_diversity

# Get text statistics
word_count = count_words(text)
diversity = calculate_lexical_diversity(text)  # Type-token ratio

Named Entity Recognition

from insideLLMs.nlp_utils import extract_named_entities

# Extract entities
entities = extract_named_entities(text)  # Requires spaCy

Keyword Extraction

from insideLLMs.nlp_utils import extract_keywords_tfidf

# Extract keywords
keywords = extract_keywords_tfidf(text, num_keywords=5)

Text Similarity

from insideLLMs.nlp_utils import cosine_similarity_texts, jaccard_similarity

# Calculate similarity between texts
similarity = cosine_similarity_texts(text1, text2)  # Requires scikit-learn
similarity = jaccard_similarity(text1, text2)  # No dependencies

See examples/example_nlp.py for a complete demonstration of all NLP utilities.

Development

To set up a development environment, clone the repository and install the development dependencies. Replace the URL with the actual repository URL.

git clone https://github.com/example/insideLLMs
cd insideLLMs
pip install -r requirements-dev.txt

Running Tests

Tests are located in the tests/ directory and can be run using pytest:

pytest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

insideLLMs

Features

Installation

Quick Start

Available Models

Available Probes

Visualization and Benchmarking

NLP Utilities

Text Cleaning and Normalization

Tokenization and Segmentation

Feature Extraction

Text Statistics and Metrics

Named Entity Recognition

Keyword Extraction

Text Similarity

Development

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
examples		examples
insideLLMs		insideLLMs
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

License

dr-gareth-roberts/insideLLMs

Folders and files

Latest commit

History

Repository files navigation

insideLLMs

Features

Installation

Quick Start

Available Models

Available Probes

Visualization and Benchmarking

NLP Utilities

Text Cleaning and Normalization

Tokenization and Segmentation

Feature Extraction

Text Statistics and Metrics

Named Entity Recognition

Keyword Extraction

Text Similarity

Development

Running Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages