Skip to content

Fix Document model: bad test, attribute bug, and field cleanup #1113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nnethercott
Copy link
Contributor

@nnethercott nnethercott commented Jun 9, 2025

Pull Request

Small bug fix & improvement.

The Document model stores twice as many fields needed through Document.__doc and doesn't properly implement __getattr__.

An example of an assert that fails :

doc = Document({"foo": "bar"})
assert getattr(doc, "foo") == "bar" # passes
assert doc.__getattr__("foo") == "bar" # fails, current implementation gives 'foo' again

Duplicated data

Currently dict(Document) gives this:

  {
    "_Document__doc": {
      "id": 5,
      "title": "Hard Times",
      "genres": [
        "Classics",
        "Fiction",
        "Victorian",
        "Literature"
      ],
      "publisher": "Penguin Classics",
      "language": "English",
      "author": "Charles Dickens",
      "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
      "format": "Hardcover",
      "rating": 3
    },
    "id": 5,
    "title": "Hard Times",
    "genres": [
      "Classics",
      "Fiction",
      "Victorian",
      "Literature"
    ],
    "publisher": "Penguin Classics",
    "language": "English",
    "author": "Charles Dickens",
    "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
    "format": "Hardcover",
    "rating": 3
  }

This PR removes the __doc field so that it now looks like:

  {
    "id": 5,
    "title": "Hard Times",
    "genres": [
      "Classics",
      "Fiction",
      "Victorian",
      "Literature"
    ],
    "publisher": "Penguin Classics",
    "language": "English",
    "author": "Charles Dickens",
    "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
    "format": "Hardcover",
    "rating": 3
  }

What does this PR do?

  • Fixes the __getattr__ implementation for Document
  • Fixes erroneous test

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • Bug Fixes

    • Improved attribute access in documents to return actual values instead of attribute names.
    • Iterating over documents now correctly yields key-value pairs from the document data.
  • Tests

    • Added a new test to verify correct document initialization and conversion.
    • Updated existing tests to reflect improved attribute access and iteration behavior.

Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Walkthrough

The Document class in the Meilisearch models was refactored to store its data directly in the instance's __dict__ instead of a private dictionary. Attribute access and iteration logic were updated accordingly. Corresponding tests were added and adjusted to validate the new behavior and ensure correct attribute handling and iteration.

Changes

File(s) Change Summary
meilisearch/models/document.py Refactored Document to store data in __dict__, updated __getattr__ to return attribute values, and adjusted iteration logic.
tests/models/test_document.py Added test for initialization, corrected attribute access test, and updated iteration test for new logic.

Poem

A hop and a skip, the Document’s new trick—
No more secrets in a private dict!
Now attributes shine, clear and direct,
With tests to ensure all works as we expect.
In fields of code, a bunny leaps,
Where data flows and logic keeps! 🐇


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2030e1b and 0fdd5b8.

📒 Files selected for processing (2)
  • meilisearch/models/document.py (1 hunks)
  • tests/models/test_document.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/models/test_document.py
  • meilisearch/models/document.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@brunoocasali brunoocasali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nnethercott thanks for your PR!

I'm not sure if I got the idea behind this part:

This PR removes the __doc field so that it now looks like:

So it means this introduces a breaking-change?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pyproject.toml (1)

117-117: Consider improving exception handling instead of disabling the warning.

While disabling the "raise-missing-from" warning is acceptable, consider whether the exception handling in the Document class could be improved to follow Python best practices for exception chaining.

tests/models/test_document.py (1)

28-28: LGTM: Verifies correct iteration behavior.

The updated assertion correctly checks that iterating over a Document yields key-value pairs from the internal dictionary.

Minor nitpick: There's a typo in the test data - "fiels2" should probably be "field2". If this is intentional for testing purposes, consider adding a comment to clarify.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01483a3 and 2030e1b.

📒 Files selected for processing (3)
  • meilisearch/models/document.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/models/test_document.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/models/test_document.py (2)
meilisearch/models/document.py (1)
  • Document (4-15)
tests/models/test_index.py (1)
  • test_getattr (8-10)
🪛 Ruff (0.11.9)
meilisearch/models/document.py

12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (3)
meilisearch/models/document.py (1)

6-6: LGTM: Eliminates duplicate data storage.

This change correctly addresses the duplicate data storage issue by updating the instance's __dict__ directly instead of maintaining a separate __doc attribute.

tests/models/test_document.py (2)

9-12: LGTM: Good test coverage for the refactored initialization.

This new test properly verifies that the Document class correctly initializes from a dictionary and can be converted back to a dictionary, ensuring the refactoring works as expected.


17-17: LGTM: Fixes the erroneous assertion.

This change correctly fixes the test to assert the actual attribute value ("test 1") instead of the attribute name ("field1"), aligning with the corrected __getattr__ implementation.

Comment on lines 8 to 12
def __getattr__(self, attr: str) -> str:
if attr in self.__doc.keys():
return attr
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
try:
return self.__dict__[attr]
except Exception as _:
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve type annotation and exception handling.

The __getattr__ method fixes the core issue but has room for improvement:

  1. Return type should be Any since attributes can be of any type
  2. Exception handling should be more specific and follow best practices
-    def __getattr__(self, attr: str) -> str:
+    def __getattr__(self, attr: str) -> Any:
         try:
             return self.__dict__[attr]
-        except Exception as _:
-            raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
+        except KeyError:
+            raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") from None

This addresses the static analysis hint about exception chaining and makes the exception handling more precise.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __getattr__(self, attr: str) -> str:
if attr in self.__doc.keys():
return attr
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
try:
return self.__dict__[attr]
except Exception as _:
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
def __getattr__(self, attr: str) -> Any:
try:
return self.__dict__[attr]
except KeyError:
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") from None
🧰 Tools
🪛 Ruff (0.11.9)

12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In meilisearch/models/document.py around lines 8 to 12, update the __getattr__
method to have a return type of Any instead of str to reflect that attributes
can be any type. Also, replace the broad except Exception clause with a more
specific KeyError exception to catch only missing dictionary keys. When raising
the AttributeError, use "from None" to suppress exception chaining and follow
best practices for clearer error handling.

@nnethercott
Copy link
Contributor Author

nnethercott commented Jun 9, 2025

So it means this introduces a breaking-change?

Kind of a grey area tbh.

__doc was a private attribute before so users would get an AttributeError if they did a document.__doc. The issue though is that this contract wasn't really respected by the previous Document.__iter__ implementation since iter(document) returns the contents of __doc along with the other fields :(

Since this PR changes the dict representation I guess it's safer to label it as a breaking-change !

@brunoocasali
Copy link
Member

brunoocasali commented Jun 9, 2025

Thanks a lot for your answer @nnethercott 🙏

Can you create a small migration guide, so I can inform other users within the release changelog? You can update the PR description and I copy from there.

@nnethercott
Copy link
Contributor Author

@brunoocasali sure thing !

🛠 migration guide

what changed

  • The internal __doc field in Document has been removed to avoid data duplication.
  • Document.__getattr__ now correctly returns attribute values rather than the attribute name itself.

impact

  • Code relying on internal access to document._Document__doc (e.g. through iterators) will break.
  • Behavior of doc.getattr("field") is now consistent and correct.

what you need to do:

  • If you were manually accessing document._Document__doc, switch to using direct attribute access or casting with dict(doc).

example updated usage

doc = index.get_document("foo")
doc_dict = dict(doc) 
doc_dict.pop("_Document__doc")  # ❌ Not needed anymore

@nnethercott nnethercott requested a review from brunoocasali June 10, 2025 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants