Skip to content

Python: vector store simplificatons and some cleanup #12274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

eavanvalkenburg
Copy link
Member

@eavanvalkenburg eavanvalkenburg commented May 26, 2025

Motivation and Context

Simplified the orderby clause for key-less get
renamed some files and removed const

Moved some things around again :)

now we have data.text_search for all things related to text_search
and data.vectors for all things vector collections and vector search
No dependencies on each other, although they both use _search assets, but they reexport what they use, so that developers can always count on getting all related stuff from data.vectors or data.text_search

Description

Contribution Checklist

@eavanvalkenburg eavanvalkenburg requested a review from a team as a code owner May 26, 2025 18:41
@markwallace-microsoft markwallace-microsoft added python Pull requests for the Python Semantic Kernel memory labels May 26, 2025
@@ -27,8 +27,8 @@
WeaviateCollection,
)
from semantic_kernel.data import VectorStoreRecordCollection, vectorstoremodel
from semantic_kernel.data.definitions import VectorStoreField
from semantic_kernel.data.vectors import SearchType, VectorSearch
from semantic_kernel.data._definitions import VectorStoreField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the rationale to change it from .definitions -> ._definitions? From a "public API" perspective, it feels weird to have the underscore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rethought that part, I really like it now!

@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
__init__.py40100% 
const.py70100% 
connectors/memory
   azure_ai_search.py3042791%159–160, 208, 210–213, 215, 244, 246, 248, 339, 361–362, 423, 425, 439–440, 451, 522, 544, 554, 566, 586–587, 781–782
   azure_cosmos_db.py40216658%184, 201, 207–209, 319, 321, 396, 400, 416–419, 421–424, 426–427, 440–443, 446–448, 454–455, 457, 464–466, 470–472, 495–497, 503–505, 511–512, 514, 519, 536, 624, 627–628, 636, 646–650, 705, 708–709, 745, 747, 760–761, 765–767, 778–781, 784–785, 787–788, 793–797, 800–803, 805–810, 812–813, 820–825, 844–846, 848–851, 859, 866–871, 873–887, 889–900, 903, 905–906, 909–910, 913, 915–920, 926, 930, 954, 982–983, 987–992, 1081–1086
   chroma.py2568965%104, 107–111, 126–127, 135–136, 165, 167, 175, 187–190, 197, 216, 220, 241, 249, 253, 266, 270, 275–276, 285–288, 304–305, 313, 332, 343, 345, 349, 367–369, 371–386, 388–398, 400–406, 408, 410, 412–413, 416–417, 419–420, 423–425, 447–448, 450
   faiss.py1201290%48, 58–59, 63–64, 121, 129, 133, 170, 217, 220, 240
   in_memory.py1702386%69–70, 74, 78–81, 116, 128, 130, 171, 175, 179, 194, 211, 226, 266–267, 269, 276–277, 282, 317
   mongodb.py25912850%100–101, 105, 117–120, 125–126, 133–134, 139–140, 149, 200, 254, 282, 284, 294–296, 302–304, 311, 315, 327–328, 336, 347–351, 360–363, 366–368, 374–375, 377, 384–386, 390–392, 404–407, 410–412, 418–419, 421, 428–430, 434–436, 444–446, 448–463, 465–475, 477–483, 485–490, 492, 494–495, 498–499, 501–502, 505–507, 512, 516, 577–578, 580, 622–623, 627–628
   pinecone.py3347976%130, 144–147, 152, 157, 183, 216, 220–221, 228, 250, 254, 282, 310–313, 329, 375, 377, 382, 389, 400, 402, 404, 406, 409, 418, 420, 424, 439, 441, 443, 448, 455, 467, 474, 476, 501–507, 513, 515, 519–528, 531–536, 539–541, 543, 549–551, 556, 561, 567–568, 571, 595, 664
   postgres.py37210172%139, 157, 184, 297–298, 383, 389–391, 396–397, 399–400, 407, 429, 485, 487, 489, 506, 518, 568, 577, 627, 647, 667, 672, 678, 690, 720, 724, 726, 748, 775, 780, 797, 854–856, 858–884, 886–892, 894–899, 901, 903–904, 907–908, 910–911, 914–923, 925, 930, 934, 961–963
   qdrant.py2525777%249, 251, 287, 306–310, 316–317, 320–321, 327–330, 332, 334, 351, 375–381, 388, 391, 394–403, 406–411, 414–418, 423, 429–430, 433, 436–437, 495, 509
   redis.py41517059%102, 107, 120, 123, 127–129, 133–134, 137, 147–151, 164–165, 299, 304–305, 316–321, 324–325, 333–335, 337, 344–348, 350, 352–354, 363, 369, 371, 375–385, 387–389, 391–401, 403–423, 425, 427–446, 449–461, 463–468, 470, 472–473, 476–477, 479–480, 483–485, 501–506, 510, 566, 568, 631–638, 694, 696, 747–754, 837, 844–845
   sql_server.py5028882%152, 349–351, 371, 390, 401, 403, 426, 472–475, 491, 503–506, 528, 530, 574–580, 586–587, 589–590, 594–608, 611–616, 619–623, 628, 634–635, 638, 643, 645–648, 787, 806–808, 814, 820, 825, 830–833, 839, 873, 915, 922–924, 1090–1091, 1093, 1095, 1116
   weaviate.py33614257%110, 315–318, 327–328, 330–332, 337, 341–342, 353–355, 364–372, 375–376, 380–381, 384–386, 389–390, 395–396, 399–400, 402, 411–413, 415–424, 426, 428–443, 447–453, 455, 457, 459–460, 463–464, 466–467, 470–472, 482–483, 486–487, 493–494, 497–498, 502, 505–509, 524, 539, 548–549, 563, 567–570, 573–574, 580–581, 585, 588–589, 592, 595, 615, 634, 650, 659–660, 665–666, 671, 765, 776–781, 786–791, 796–797
connectors/search
   brave.py1432383%23, 166, 172, 182, 201–204, 240, 245, 247, 265–276
   google.py1462185%24, 212, 222, 244–249, 264–265, 298–307
data
   __init__.py00100% 
   _search.py62985%141, 147, 173, 177–178, 180, 183–184, 186
   text_search.py69494%229, 240, 277, 279
   vectors.py66711882%83, 413, 579, 698, 896–899, 905, 907–908, 1026, 1064–1065, 1068, 1075, 1095–1102, 1104, 1106–1107, 1109, 1111, 1114–1119, 1121–1126, 1148, 1171, 1251–1253, 1333, 1484, 1499, 1505–1506, 1508, 1535, 1609–1610, 1622–1623, 1652, 1737, 1742–1745, 1859, 1886–1887, 1926, 1946, 1954–1955, 1967, 1974, 2019, 2022, 2076–2078, 2081, 2090, 2113, 2115–2120, 2122–2127, 2131–2137, 2141–2146, 2149–2151, 2153, 2181, 2192, 2206, 2214, 2253, 2277, 2291, 2337, 2377
TOTAL26283443183% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
3600 18 💤 0 ❌ 0 🔥 1m 49s ⏱️

@eavanvalkenburg eavanvalkenburg merged commit aad3d50 into microsoft:feature-python-vector-stores-preb May 27, 2025
19 checks passed
@eavanvalkenburg eavanvalkenburg deleted the vector_store_cleanup branch May 27, 2025 10:49
eavanvalkenburg added a commit that referenced this pull request Jun 16, 2025
### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
Simplified the orderby clause for key-less get
renamed some files and removed const

Moved some things around again :)

now we have data.text_search for all things related to text_search
and data.vectors for all things vector collections and vector search
No dependencies on each other, although they both use _search assets,
but they reexport what they use, so that developers can always count on
getting all related stuff from data.vectors or data.text_search


### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
eavanvalkenburg added a commit that referenced this pull request Jun 17, 2025
### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->
Simplified the orderby clause for key-less get
renamed some files and removed const

Moved some things around again :)

now we have data.text_search for all things related to text_search
and data.vectors for all things vector collections and vector search
No dependencies on each other, although they both use _search assets,
but they reexport what they use, so that developers can always count on
getting all related stuff from data.vectors or data.text_search


### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants