ContextCore is a Python library designed to overcome context window limitations in smaller local LLMs like smollm2:1.7b from Ollama. It implements a unified memory system that combines high-level thinking memory with detailed raw memory to provide extended context capabilities.
# Clone the repository
git clone https://github.com/Priyanshu-i/ContextCore.git
cd contextcore
# Install the package
pip install -e .
# Optional but recommended dependencies
pip install sentence-transformers # For better embeddings
pip install hnswlib # For vector storage
pip install redis # For faster key-value storage
pip install requests # For Ollama API communication
from contextcore import ContextCore
# Initialize ContextCore with your local LLM
context = ContextCore(
model_name="smollm2:1.7b", # Your Ollama model
ollama_url="http://localhost:11434" # Ollama API URL
)
# Initialize a new session with an objective
context.initialize_session("Building a robust memory system for local LLMs")
# Process user inputs and get responses
response = context.process_user_input("How can I implement a vector store for text embeddings?")
print(response)
# Save the session for later use
context.save()
# Load a saved session
loaded_context = ContextCore.load("./contextcore_storage")
-
Two-Tier Memory System:
- Thinking Memory (TME): High-level reasoning, concepts, and session strategies
- Raw Memory (RME): Detailed facts, user inputs, and specific technical information
-
Semantic Search: Find relevant memories based on semantic similarity
-
Session Management: Maintain coherent, ongoing conversations with automatic summarization
-
Local LLM Integration: Seamless integration with Ollama-based local models
-
Persistence: Save and load sessions to continue conversations later
# Use Redis for faster key-value storage
context = ContextCore(
model_name="smollm2:1.7b",
use_redis=True # Enable Redis storage
)
# Customize vector dimensions (if using a different embedding model)
context = ContextCore(
model_name="smollm2:1.7b",
vector_dim=768 # For larger embedding models
)
# Use a different Ollama model
context = ContextCore(
model_name="llama3:8b", # Any model you have in Ollama
)
# Connect to a remote Ollama instance
context = ContextCore(
model_name="mistral:7b",
ollama_url="http://your-ollama-server:11434"
)
# Manually add thinking memory
context.memory_store.add_thinking_memory(
content="The key insight is to use hierarchical summarization",
importance=0.9,
metadata={"topic": "architecture", "source": "design_doc"}
)
# Manually add raw memory
context.memory_store.add_raw_memory(
content="User prefers Python over JavaScript for this project",
category="user", # user, session, or agent
relevance_score=0.7,
metadata={"source": "conversation"}
)
# Search memories
memories = context.memory_store.search_memories(
query="vector databases",
k=5, # Return top 5 results
filter_type="raw", # Only raw memories
min_score=0.6 # Minimum similarity threshold
)
-
ThinkingMemory: Used for high-level concepts and reasoning
- Contains: content, timestamp, importance score, metadata
-
RawMemory: Used for detailed facts and specific information
- Contains: content, timestamp, category, relevance score, metadata
-
VectorStore: Stores and retrieves memories using semantic search
- Uses HNSWlib for efficient similarity search
-
SimpleEmbedder: Converts text to vector embeddings
- Uses sentence-transformers if available, with a simple fallback
-
MemoryStore: Combines vector storage with metadata-based retrieval
- Optional Redis integration for faster lookups
-
OllamaClient: Interfaces with Ollama API for text generation
-
ContextCore: Main class that coordinates all components
-
Initialization:
- Always provide a clear session objective
- Use the most powerful local LLM you have available
-
Memory Management:
- Let the system handle memory management automatically
- For critical information, manually add high-importance memories
-
Performance Optimization:
- Install sentence-transformers for better embeddings
- Use Redis for faster key-value lookups in production
-
Troubleshooting:
- Check Ollama is running and the model is loaded
- Ensure you have sufficient RAM for vector operations
- Look at the logs for detailed information about operations
ContextCore implements a unified memory system that combines:
- Hierarchical Summarization: Continuously distills conversation into structured summaries
- Incremental Updates: Updates high-level summaries with new insights
- Semantic Retrieval: Fetches the most relevant detailed memories
- Dynamic Injection: Combines high-level thinking with detailed context
This approach enables small local LLMs to maintain coherent conversations even when the raw input exceeds their context window.