-
Notifications
You must be signed in to change notification settings - Fork 3k
Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation #2006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…orageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Remove backward compatibility support from the VectorStoreFactory and the StorageFactory classes. |
…rageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Done! I've removed all backward compatibility support from both VectorStoreFactory and StorageFactory classes. The changes include: Removed:
Enhanced:
The factories now have a cleaner, more consistent API without the backward compatibility cruft. All existing functionality continues to work - only the deprecated features have been removed. Commit: 5aae40b |
Run |
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Done! I've run The changes include:
Commit: 7586c4c |
The following error occurs in the pytests. Update the pytests if necessary to reflect the changes that have been made in the StorageFactory and VectorStoreFactory classes. /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py
/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:124:13 - error: Method "get" overrides class "PipelineStorage" in an incompatible manner
Positional parameter count mismatch; base method has 4, but override has 2
Parameter 3 mismatch: base parameter "as_bytes" is keyword parameter, override parameter is position-only
Parameter 4 mismatch: base parameter "encoding" is keyword parameter, override parameter is position-only
Return type mismatch: base method returns type "CoroutineType[Any, Any, Any]", override returns type "bytes | None"
Type "bytes | None" is not assignable to type "CoroutineType[Any, Any, Any]"
"bytes" is not assignable to "CoroutineType[Any, Any, Any]" (reportIncompatibleMethodOverride)
/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:127:13 - error: Method "set" overrides class "PipelineStorage" in an incompatible manner
Positional parameter count mismatch; base method has 4, but override has 3
Parameter 4 mismatch: base parameter "encoding" is keyword parameter, override parameter is position-only
Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
"None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:130:13 - error: Method "delete" overrides class "PipelineStorage" in an incompatible manner
Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
"None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:133:13 - error: Method "has" overrides class "PipelineStorage" in an incompatible manner
Return type mismatch: base method returns type "CoroutineType[Any, Any, bool]", override returns type "bool"
"bool" is not assignable to "CoroutineType[Any, Any, bool]" (reportIncompatibleMethodOverride)
/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:139:13 - error: Method "clear" overrides class "PipelineStorage" in an incompatible manner
Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
"None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
5 errors, 0 warnings, 0 informations |
…e interface implementation Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
... Fixed the pytest errors by updating the CustomStorage test class to properly implement the PipelineStorage interface. The errors were caused by method signature mismatches - the test was using synchronous methods while PipelineStorage defines async methods with different parameter lists. Key changes:
Also ran Commit: 12e1744 |
@copilot refactor the |
Update to latest main |
@AlonsoGuevara Just updated from the main branch and ran some local tests. It should be good now. |
…consistent registration patterns and add custom vector store documentation (microsoft#2006) * Initial plan * Refactor VectorStoreFactory to use registration functionality like StorageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Fix linting issues in VectorStoreFactory refactoring Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Remove backward compatibility support from VectorStoreFactory and StorageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * Run ruff check --fix and ruff format, add semversioner file Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * ruff formatting fixes * update storage factory design * Refactor CacheFactory to use registration functionality like StorageFactory Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * revert copilot changes * fix copilot changes * update comments * Fix failing pytest compatibility for factory tests Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * update class instantiation issue * ruff fixes * fix pytest * add default value * ruff formatting changes * ruff fixes * revert minor changes * cleanup cache factory * Update CacheFactory tests to match consistent factory pattern Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * update pytest thresholds * adjust threshold levels * Add custom vector store implementation notebook Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes: - Complete implementation of SimpleInMemoryVectorStore - Registration with VectorStoreFactory - Testing and validation examples - Configuration examples for GraphRAG settings - Advanced features and best practices - Production considerations checklist The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends. Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> * remove sample notebook for now * update tests * fix cache pytests * add pandas-stub to dev dependencies * disable warning check for well known key * skip tests when running on ubuntu * add documentation for custom vector store implementations * ignore ruff findings in notebooks * fix merge breakages * speedup CLI import statements * remove unnecessary import statements in init file * Add str type option on storage/cache type * Fix store name * Add LoggerFactory * Fix up logging setup across CLI/API * Add LoggerFactory test * Fix err message * Semver * Remove enums from factory methods --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
This PR refactors three factory classes (CacheFactory, StorageFactory, and VectorStoreFactory) to use a consistent registration-based pattern, updates the corresponding integration tests, and adds comprehensive documentation for implementing custom vector stores.
Problem
Several of the main factory classes have an inconsistent design that is causing more tech debt.
For example, the
CacheFactory
used hardcoded match/case statements while other factories used registration patterns, and pytests expected different behaviors.Changes Made
🏗️ Factory Refactoring
register()
pattern with factory functions🧪 Test Updates
create_*
methods now use consistent two-parameter pattern:create_cache(cache_type, kwargs)
TypeError
for class registration to reflect that factories actually allow thisroot_dir
in kwargs instead of separate parameter🔧 Bug Fixes
📘 Documentation
docs/examples_notebooks/custom_vector_store.ipynb
demonstrating how to implement and register custom vector storesSimpleInMemoryVectorStore
showing all requiredBaseVectorStore
methodsExample Usage
Factory Registration Pattern (now consistent across all factories):
Custom Vector Store Implementation:
Results