Skip to content

Missing Test Coverage #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evanmschultz opened this issue Apr 17, 2025 · 2 comments
Open

Missing Test Coverage #375

evanmschultz opened this issue Apr 17, 2025 · 2 comments

Comments

@evanmschultz
Copy link
Contributor

Add Test Coverage and Expand Test Suite

Context

I've noticed that the tests folder is currently minimal, and I'd like to help improve test coverage. Adding tests would:

  • Improve code quality and reliability
  • Help me better understand the project's inner workings
  • Make it easier for me to contribute more effectively in the future

Proposed Changes

  1. Add pytest-cov to track test coverage
  2. Begin expanding the test suite incrementally

Questions for Maintainers

  • Are there specific areas of the codebase that need test coverage most urgently?
  • Are there any testing patterns or conventions I should follow, that wouldn't be obvious by simply looking at the existing tests?
  • Are there any existing test utilities or fixtures I should be aware of, or any you think I should add before I begin?

I'm happy to discuss the approach and get feedback before starting the implementation.

@evanmschultz evanmschultz mentioned this issue Apr 18, 2025
@prasmussen15
Copy link
Collaborator

The codebase for sure needs more tests, I'll dive a bit into how we approach different tests, we aren't a big fan of packages that measure test coverage though.

All code comes with costs (writing an maintenance), so if the code is more complicated in the test than the function it is testing, that test is not really doing much to reduce errors I the codebase when working with that function.

We also have extensive use of type hints and use MyPy, so we eliminate a huge class of bugs that often plague dynamically typed codebases.

For unit tests, it also doesn't make sense to write unit tests on functions whose main logical components is an LLM call or a Cypher query. An example is EntityNode.get_by_uuid(), where if you mock the cypher query call you're basically just checking if Python's list comprehension feature works.

There are definitely functions in the codebase that need unit tests though. resolve_extracted_edge() in edge_operations.py has some complicated post-processing logic, it also has some unit tests but I don't think they cover all of the possible scenarios so that might be a good place to look at.

I see you're also adding unit tests to the llm_client and embedder_client integrations, that is also a good place since we don't necessarily use all of them very often.

Integration tests should be the main set of tests for functions that have a Cypher query component. The CRUD functions in node.py and edge.py could use some integration tests, especially since some of them aren't directly called anywhere else I the codebase so it is easier for a bug to go unnoticed there. Integration tests for those would involve add files for each of them, seeding some data to neo4j, and making sure all of the CRUD operations can find proper data. That would be the biggest help in terms of test coverage.

Finally, for LLM and search testing we are moving towards evals (you'll see an evals folder in the tests folder). The LLM calls are the most brittle part of the application, and changing prompts can have a significant effect o quality without actually breaking the flow. Over time we have also collected a lot of data to help us improve the system. As such, it makes sense for us to test our LLM calls and search results through evals. However, this is a slow process since it takes a long time to create and annotate a dataset, let alone the time it takes to build the framework. So this is a slow process that we are dedicating time to whenever the team is free.

We also have some internal e2e evals we run before releases to make sure graph building and searching is working properly. Evals testing our graph building and search functionality also double as testing the functionality of the "units" as well. And most of the code base is either data ingestion, or search.

That is just some off the cuff rambling about tests though. I would say the integration tests for node.py and edge.py are the most important basic tests that we need. And in general our evals confirm things are working properly e2e, and they will become more granular as we annotate our data and build out the framework.

@evanmschultz
Copy link
Contributor Author

That makes sense. Thank you very much for the information. I will delete my Add Tests PR #377 since it added coverage and then redo the Embedder tests so they are more inline with what you guys are looking for. One question about that, should I still use pytest-mock? Also, my Even Better Toml extension wants to reformat the pyproject.toml to add indentations and reorder things in alphabetical order; would you like to turn that off if I do add pytest-mock as a dev dependency? I can imagine looking over that diff in a PR would be annoying.

Next, about my #372 PR. It seems like that one would still fit with what you described. Should I update anything on that before it can be merged?

Lastly, after I look through all you described and start working on it, should I message you directly on discord if I have any questions or contact you some other way?

Thank you again for the information and help. It makes things much more clear and will make contributing much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants