Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jul 22, 2025

This PR refactors three factory classes (CacheFactory, StorageFactory, and VectorStoreFactory) to use a consistent registration-based pattern, updates the corresponding integration tests, and adds comprehensive documentation for implementing custom vector stores.

Problem

Several of the main factory classes have an inconsistent design that is causing more tech debt.

For example, the CacheFactory used hardcoded match/case statements while other factories used registration patterns, and pytests expected different behaviors.

Changes Made

🏗️ Factory Refactoring

  • CacheFactory: Refactored from hardcoded match/case to registration-based pattern matching StorageFactory and VectorStoreFactory
  • StorageFactory & VectorStoreFactory: Removed backward compatibility support for cleaner, more consistent APIs
  • Consistent registration methods: All factories now use the same register() pattern with factory functions

🧪 Test Updates

  • Updated all factory integration tests to reflect actual factory behavior and consistent method signatures
  • Fixed method signatures: All create_* methods now use consistent two-parameter pattern: create_cache(cache_type, kwargs)
  • Fixed test expectations: Changed tests that expected TypeError for class registration to reflect that factories actually allow this
  • Fixed parameter handling: Updated file cache tests to pass root_dir in kwargs instead of separate parameter

🔧 Bug Fixes

  • Fixed PipelineStorage interface implementation in storage factory tests with proper async method signatures
  • Fixed constructor parameters for MemoryPipelineStorage and other test classes
  • Added missing abstract methods to test implementations

📘 Documentation

  • Added comprehensive notebook docs/examples_notebooks/custom_vector_store.ipynb demonstrating how to implement and register custom vector stores
  • Complete implementation example with SimpleInMemoryVectorStore showing all required BaseVectorStore methods
  • Testing and validation examples with comprehensive test suite and error handling
  • Configuration examples for GraphRAG settings integration
  • Advanced features like similarity thresholds, performance tracking, and production considerations
  • Best practices guide covering performance, persistence, scalability, reliability, and security

Example Usage

Factory Registration Pattern (now consistent across all factories):

# Register custom implementations
CacheFactory.register("my_cache", create_my_cache)
StorageFactory.register("my_storage", create_my_storage)  
VectorStoreFactory.register("my_vector_store", MyVectorStoreClass)

# Use consistent two-parameter creation pattern
cache = CacheFactory.create_cache("my_cache", {"param": "value"})
storage = StorageFactory.create_storage("my_storage", {"param": "value"})
vector_store = VectorStoreFactory.create_vector_store("my_vector_store", {"param": "value"})

Custom Vector Store Implementation:

class MyCustomVectorStore(BaseVectorStore):
    def connect(self, **kwargs): ...
    def load_documents(self, documents, overwrite=True): ...
    def similarity_search_by_vector(self, query_embedding, k=10, **kwargs): ...
    # ... implement all required methods

# Register and use in GraphRAG
VectorStoreFactory.register("my_custom", MyCustomVectorStore)

# Configuration in settings.yml
vector_store:
  default_vector_store:
    type: "my_custom"
    collection_name: "entities"
    custom_param: "value"

Results

  • All factory integration tests now pass with consistent behavior
  • Unified API design across all three factory classes
  • Cleaner, more maintainable code without backward compatibility cruft
  • Enhanced extensibility through registration-based architecture
  • Comprehensive documentation enabling developers to implement custom vector stores as plug-and-play components

Copilot AI and others added 2 commits July 22, 2025 11:08
…orageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] [Feature Request]: Refactor VectorStoreFactory class to use registration functionality Refactor VectorStoreFactory to use registration functionality like StorageFactory Jul 22, 2025
@Copilot Copilot AI requested a review from jgbradley1 July 22, 2025 11:14
Copilot finished work on behalf of jgbradley1 July 22, 2025 11:14
@jgbradley1
Copy link
Collaborator

Remove backward compatibility support from the VectorStoreFactory and the StorageFactory classes.

…rageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jul 22, 2025

Remove backward compatibility support from the VectorStoreFactory and the StorageFactory classes.

Done! I've removed all backward compatibility support from both VectorStoreFactory and StorageFactory classes. The changes include:

Removed:

  • vector_store_types and storage_types class variables
  • Complex registration logic that handled both callable factories and direct class types
  • All backward compatibility code paths

Enhanced:

  • Added runtime validation to explicitly reject class registration with clear error messages
  • Simplified register() methods to only accept factory functions
  • Updated tests to verify the new behavior

The factories now have a cleaner, more consistent API without the backward compatibility cruft. All existing functionality continues to work - only the deprecated features have been removed.

Commit: 5aae40b

Copilot finished work on behalf of jgbradley1 July 22, 2025 11:43
@jgbradley1 jgbradley1 marked this pull request as ready for review July 22, 2025 12:35
@jgbradley1 jgbradley1 requested a review from a team as a code owner July 22, 2025 12:35
@jgbradley1
Copy link
Collaborator

Run ruff check —-fix and ruff format to fix any listing and formatting errors. Also create a semversioner file with the appropriate commit message

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jul 22, 2025

Run ruff check —-fix and ruff format to fix any listing and formatting errors. Also create a semversioner file with the appropriate commit message

Done! I've run ruff check --fix and ruff format to fix linting and formatting issues, and created a semversioner file for the changes.

The changes include:

  • Formatted test files to comply with code style standards
  • Created semversioner file with patch-level change description
  • Fixed line length and whitespace formatting issues

Commit: 7586c4c

Copilot finished work on behalf of jgbradley1 July 22, 2025 12:47
@jgbradley1
Copy link
Collaborator

The following error occurs in the pytests. Update the pytests if necessary to reflect the changes that have been made in the StorageFactory and VectorStoreFactory classes.

/home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:124:13 - error: Method "get" overrides class "PipelineStorage" in an incompatible manner
    Positional parameter count mismatch; base method has 4, but override has 2
    Parameter 3 mismatch: base parameter "as_bytes" is keyword parameter, override parameter is position-only
    Parameter 4 mismatch: base parameter "encoding" is keyword parameter, override parameter is position-only
    Return type mismatch: base method returns type "CoroutineType[Any, Any, Any]", override returns type "bytes | None"
      Type "bytes | None" is not assignable to type "CoroutineType[Any, Any, Any]"
        "bytes" is not assignable to "CoroutineType[Any, Any, Any]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:127:13 - error: Method "set" overrides class "PipelineStorage" in an incompatible manner
    Positional parameter count mismatch; base method has 4, but override has 3
    Parameter 4 mismatch: base parameter "encoding" is keyword parameter, override parameter is position-only
    Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
      "None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:130:13 - error: Method "delete" overrides class "PipelineStorage" in an incompatible manner
    Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
      "None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:133:13 - error: Method "has" overrides class "PipelineStorage" in an incompatible manner
    Return type mismatch: base method returns type "CoroutineType[Any, Any, bool]", override returns type "bool"
      "bool" is not assignable to "CoroutineType[Any, Any, bool]" (reportIncompatibleMethodOverride)
  /home/runner/work/graphrag/graphrag/tests/integration/storage/test_factory.py:139:13 - error: Method "clear" overrides class "PipelineStorage" in an incompatible manner
    Return type mismatch: base method returns type "CoroutineType[Any, Any, None]", override returns type "None"
      "None" is not assignable to "CoroutineType[Any, Any, None]" (reportIncompatibleMethodOverride)
5 errors, 0 warnings, 0 informations

…e interface implementation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jul 22, 2025

The following error occurs in the pytests. Update the pytests if necessary to reflect the changes that have been made in the StorageFactory and VectorStoreFactory classes.

...

Fixed the pytest errors by updating the CustomStorage test class to properly implement the PipelineStorage interface. The errors were caused by method signature mismatches - the test was using synchronous methods while PipelineStorage defines async methods with different parameter lists.

Key changes:

  • Updated all storage methods to be async with correct signatures
  • Added missing abstract methods (find, child, keys, get_creation_date)
  • Fixed parameter types and counts to match interface requirements

Also ran ruff check --fix and ruff format as requested, fixing a dataclass annotation issue.

Commit: 12e1744

Copilot finished work on behalf of jgbradley1 July 22, 2025 16:50
@jgbradley1
Copy link
Collaborator

@copilot refactor the CacheFactory in graphrag/cache/factory.py to have a similar design as the StorageFactory and VectorStoreFactory. Update the pytests if necessary.

@Copilot Copilot AI changed the title Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns Refactor CacheFactory, StorageFactory, and VectorStoreFactory to use consistent registration patterns and add custom vector store documentation Jul 31, 2025
Copilot finished work on behalf of jgbradley1 July 31, 2025 22:03
@AlonsoGuevara
Copy link
Collaborator

Update to latest main

@jgbradley1
Copy link
Collaborator

Update to latest main

@AlonsoGuevara Just updated from the main branch and ran some local tests. It should be good now.

@natoverse natoverse merged commit 2030f94 into main Aug 28, 2025
16 checks passed
@natoverse natoverse deleted the copilot/fix-2005 branch August 28, 2025 20:53
opensourcemukul pushed a commit to opensourcemukul/graphrag that referenced this pull request Sep 13, 2025
…consistent registration patterns and add custom vector store documentation (microsoft#2006)

* Initial plan

* Refactor VectorStoreFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Fix linting issues in VectorStoreFactory refactoring

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Remove backward compatibility support from VectorStoreFactory and StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* Run ruff check --fix and ruff format, add semversioner file

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* Fix pytest errors in storage factory tests by updating PipelineStorage interface implementation

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* ruff formatting fixes

* update storage factory design

* Refactor CacheFactory to use registration functionality like StorageFactory

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* revert copilot changes

* fix copilot changes

* update comments

* Fix failing pytest compatibility for factory tests

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update class instantiation issue

* ruff fixes

* fix pytest

* add default value

* ruff formatting changes

* ruff fixes

* revert minor changes

* cleanup cache factory

* Update CacheFactory tests to match consistent factory pattern

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* update pytest thresholds

* adjust threshold levels

* Add custom vector store implementation notebook

Create comprehensive notebook demonstrating how to implement and register custom vector stores with GraphRAG as a plug-and-play framework. Includes:

- Complete implementation of SimpleInMemoryVectorStore
- Registration with VectorStoreFactory
- Testing and validation examples
- Configuration examples for GraphRAG settings
- Advanced features and best practices
- Production considerations checklist

The notebook provides a complete walkthrough for developers to understand and implement their own vector store backends.

Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>

* remove sample notebook for now

* update tests

* fix cache pytests

* add pandas-stub to dev dependencies

* disable warning check for well known key

* skip tests when running on ubuntu

* add documentation for custom vector store implementations

* ignore ruff findings in notebooks

* fix merge breakages

* speedup CLI import statements

* remove unnecessary import statements in init file

* Add str type option on storage/cache type

* Fix store name

* Add LoggerFactory

* Fix up logging setup across CLI/API

* Add LoggerFactory test

* Fix err message

* Semver

* Remove enums from factory methods

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jgbradley1 <654554+jgbradley1@users.noreply.github.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Nathan Evans <github@talkswithnumbers.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants