䏿–‡ç‰ˆ | English | 📚 Wiki
# Basic dependencies
pip install numpy
# Optional: OpenAI support
pip install openai python-dotenv
# Optional: Better vector embeddings
pip install sentence-transformers
# Optional: Enhanced experiments (statistical analysis and visualization)
pip install scipy matplotlib
Create a .env
file:
# OpenAI Official API
OPENAI_API_KEY=sk-your-key-here
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_MODEL=gpt-3.5-turbo
# Or use Azure OpenAI
# OPENAI_API_KEY=your-azure-key
# OPENAI_API_BASE=https://your-resource.openai.azure.com
# OPENAI_MODEL=your-deployment-name
# Or use local models (e.g., Ollama)
# OPENAI_API_BASE=http://localhost:11434/v1
# OPENAI_MODEL=llama2
# Basic experiment (4-round dialogue)
python cognitive_workspace_poc.py
# Enhanced experiments (10-round dialogue + multi-hop reasoning + conflict resolution)
python cognitive_workspace_enhanced.py
Requires OpenAI API key, demonstrates real LLM behavioral differences:
- Higher quality task decomposition
- More accurate information prediction
- More coherent answer generation
No API key required, uses rule-based simulation:
- Still demonstrates architectural differences
- Suitable for proof-of-concept
- Fully reproducible
Uses local models like Ollama:
- Data privacy
- No API costs
- Performance depends on local hardware
Compares Cognitive Workspace vs traditional RAG on single complex questions:
- Operation count difference (12 vs 3)
- Operation type difference (active vs passive)
- Memory management difference (hierarchical vs flat)
- Single-turn memory reuse rate: 50% vs 0%
Demonstrates cumulative advantages from state persistence:
Round CW Reuse Rate RAG Reuse Rate
1 50.0% 0%
2 55.0% 0%
3 56.7% 0%
4 56.4% 0%
Average reuse rate: 54.5% vs 0%
Memory advantages in long-term conversations:
Average reuse rate: 57.1% vs 0%
Net efficiency gain: 17.3%
Cohen's d: 23.2 (huge effect)
P-value: < 0.001 (extremely significant)
Advantages in complex reasoning chains:
Average reuse rate: 58.8% vs 0%
Net efficiency gain: 17.9%
Cohen's d: 190.0 (extremely large effect)
Operations saved: 194
Performance when handling contradictory information:
Average reuse rate: 59.8% vs 0%
Net efficiency gain: 17.8%
Cohen's d: 195.7 (extremely large effect)
Operations saved: 226
cognitive_workspace_results.json
: Basic experiment resultsenhanced_results.json
: Enhanced experiment detailed resultscognitive_workspace_analysis.png
: Experiment visualization charts.env.example
: Environment variable template (if .env doesn't exist)
- Basic experiment (4 rounds): Average 54.5%, reuse starts from round 1
- 10-round dialogue: Average 57.1%, long-term dialogue advantage clear
- Multi-hop reasoning: Average 58.8%, higher reuse rate for complex tasks
- Conflict resolution: Average 59.8%, best performance in information integration scenarios
- Traditional RAG: Always 0% (stateless)
Net efficiency = Reuse rate / (1 + Extra operation ratio)
- 10-round dialogue: 17.3% net improvement
- Multi-hop reasoning: 17.9% net improvement
- Conflict resolution: 17.8% net improvement
- P-values: All experiments < 0.001 (extremely significant)
- Cohen's d effect size:
- 10-round dialogue: 23.2 (huge)
- Multi-hop reasoning: 190.0 (extremely large)
- Conflict resolution: 195.7 (extremely large)
- Cognitive Workspace: Sub-linear growth (reduces redundant computation through memory reuse)
- Traditional RAG: Linear growth (starts fresh for each query)
- Cognitive Workspace: Dynamically tracks task completion and information sufficiency
- Traditional RAG: No confidence concept
This code supports the following paper arguments:
-
Active memory management outperforms passive retrieval
- Code proof: Task decomposition, information prediction, active preparation
-
State persistence improves efficiency
- Code proof: Memory reuse in multi-turn dialogues
-
Hierarchical buffers optimize resource utilization
- Code proof: immediate→working→episodic promotion mechanism
-
Metacognitive control enhances intelligence
- Code proof: Confidence tracking, information gap identification
A: Because we prove architectural behavioral differences, not generation quality. Even with rule simulation, the differences between active vs passive, stateful vs stateless are still obvious.
A:
Code available at: \url{https://github.com/tao-hpu/cognitive-workspace}
A: Full experiments require approximately:
- Single-turn experiment: ~10 API calls
- Multi-turn experiment: ~20 API calls
- Total cost: < $0.05 (using GPT-3.5-turbo)
A: Yes! The code supports:
- OpenAI-compatible APIs (by modifying OPENAI_API_BASE)
- Local models (Ollama, llama.cpp)
- Any service providing chat/completion interfaces
-
Add longer-term tests (20+ rounds)
# Modify question list in enhanced_experiment.py extended_questions = [...20 questions...]
-
Integrate real vector databases
# Use ChromaDB or Pinecone from chromadb import Client
-
Add more statistical tests
# Mann-Whitney U test, Friedman test, etc. from scipy import stats stats.mannwhitneyu(cw_results, rag_results)
-
Performance benchmarking
# Test performance at different scales for doc_count in [10, 100, 1000]: test_scalability(doc_count)
If you use this code, please cite:
@article{an2025cognitive,
title={Cognitive Workspace: Towards Functional Infinite Context Through Active Memory Management},
author={Tao An},
year={2025},
eprint={2508.13171},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
MIT License - Free to use, modify and distribute