Fix Document model: bad test, attribute bug, and field cleanup #1113

nnethercott · 2025-06-09T14:06:14Z

Pull Request

Small bug fix & improvement.

The Document model stores twice as many fields needed through Document.__doc and doesn't properly implement __getattr__.

An example of an assert that fails :

doc = Document({"foo": "bar"})
assert getattr(doc, "foo") == "bar" # passes
assert doc.__getattr__("foo") == "bar" # fails, current implementation gives 'foo' again

Duplicated data

Currently dict(Document) gives this:

  {
    "_Document__doc": {
      "id": 5,
      "title": "Hard Times",
      "genres": [
        "Classics",
        "Fiction",
        "Victorian",
        "Literature"
      ],
      "publisher": "Penguin Classics",
      "language": "English",
      "author": "Charles Dickens",
      "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
      "format": "Hardcover",
      "rating": 3
    },
    "id": 5,
    "title": "Hard Times",
    "genres": [
      "Classics",
      "Fiction",
      "Victorian",
      "Literature"
    ],
    "publisher": "Penguin Classics",
    "language": "English",
    "author": "Charles Dickens",
    "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
    "format": "Hardcover",
    "rating": 3
  }

This PR removes the __doc field so that it now looks like:

  {
    "id": 5,
    "title": "Hard Times",
    "genres": [
      "Classics",
      "Fiction",
      "Victorian",
      "Literature"
    ],
    "publisher": "Penguin Classics",
    "language": "English",
    "author": "Charles Dickens",
    "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
    "format": "Hardcover",
    "rating": 3
  }

What does this PR do?

Fixes the __getattr__ implementation for Document
Fixes erroneous test

PR checklist

Please check if your PR fulfills the following requirements:

Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
Have you read the contributing guidelines?
Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

Bug Fixes
- Improved attribute access in documents to return actual values instead of attribute names.
- Iterating over documents now correctly yields key-value pairs from the document data.
Tests
- Added a new test to verify correct document initialization and conversion.
- Updated existing tests to reflect improved attribute access and iteration behavior.

coderabbitai · 2025-06-09T14:06:21Z

Walkthrough

The Document class in the Meilisearch models was refactored to store its data directly in the instance's __dict__ instead of a private dictionary. Attribute access and iteration logic were updated accordingly. Corresponding tests were added and adjusted to validate the new behavior and ensure correct attribute handling and iteration.

Changes

File(s)	Change Summary
meilisearch/models/document.py	Refactored `Document` to store data in `__dict__`, updated `__getattr__` to return attribute values, and adjusted iteration logic.
tests/models/test_document.py	Added test for initialization, corrected attribute access test, and updated iteration test for new logic.

Poem

A hop and a skip, the Document’s new trick—
No more secrets in a private dict!
Now attributes shine, clear and direct,
With tests to ensure all works as we expect.
In fields of code, a bunny leaps,
Where data flows and logic keeps! 🐇

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2030e1b and 0fdd5b8.

📒 Files selected for processing (2)

meilisearch/models/document.py (1 hunks)
tests/models/test_document.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/models/test_document.py
meilisearch/models/document.py

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

brunoocasali

Hey @nnethercott thanks for your PR!

I'm not sure if I got the idea behind this part:

This PR removes the __doc field so that it now looks like:

So it means this introduces a breaking-change?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

pyproject.toml (1)

117-117: Consider improving exception handling instead of disabling the warning.

While disabling the "raise-missing-from" warning is acceptable, consider whether the exception handling in the Document class could be improved to follow Python best practices for exception chaining.

tests/models/test_document.py (1)

28-28: LGTM: Verifies correct iteration behavior.

The updated assertion correctly checks that iterating over a Document yields key-value pairs from the internal dictionary.

Minor nitpick: There's a typo in the test data - "fiels2" should probably be "field2". If this is intentional for testing purposes, consider adding a comment to clarify.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01483a3 and 2030e1b.

📒 Files selected for processing (3)

meilisearch/models/document.py (1 hunks)
pyproject.toml (1 hunks)
tests/models/test_document.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

tests/models/test_document.py (2)

meilisearch/models/document.py (1)

Document (4-15)

tests/models/test_index.py (1)

test_getattr (8-10)

🪛 Ruff (0.11.9)

meilisearch/models/document.py

12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (3)

meilisearch/models/document.py (1)

6-6: LGTM: Eliminates duplicate data storage.

This change correctly addresses the duplicate data storage issue by updating the instance's __dict__ directly instead of maintaining a separate __doc attribute.

tests/models/test_document.py (2)

9-12: LGTM: Good test coverage for the refactored initialization.

This new test properly verifies that the Document class correctly initializes from a dictionary and can be converted back to a dictionary, ensuring the refactoring works as expected.

17-17: LGTM: Fixes the erroneous assertion.

This change correctly fixes the test to assert the actual attribute value ("test 1") instead of the attribute name ("field1"), aligning with the corrected __getattr__ implementation.

coderabbitai · 2025-06-09T14:13:07Z

meilisearch/models/document.py

    def __getattr__(self, attr: str) -> str:
-        if attr in self.__doc.keys():
-            return attr
-        raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
+        try:
+            return self.__dict__[attr]
+        except Exception as _:
+            raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")


🛠️ Refactor suggestion

Improve type annotation and exception handling.

The __getattr__ method fixes the core issue but has room for improvement:

Return type should be Any since attributes can be of any type

Exception handling should be more specific and follow best practices

- def __getattr__(self, attr: str) -> str: + def __getattr__(self, attr: str) -> Any: try: return self.__dict__[attr] - except Exception as _: - raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") + except KeyError: + raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") from None

This addresses the static analysis hint about exception chaining and makes the exception handling more precise.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __getattr__(self, attr: str) -> str:

if attr in self.__doc.keys():

return attr

raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")

try:

return self.__dict__[attr]

except Exception as _:

raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")

def __getattr__(self, attr: str) -> Any:

try:

return self.__dict__[attr]

except KeyError:

raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") from None

🧰 Tools

🪛 Ruff (0.11.9)

12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

In meilisearch/models/document.py around lines 8 to 12, update the __getattr__ method to have a return type of Any instead of str to reflect that attributes can be any type. Also, replace the broad except Exception clause with a more specific KeyError exception to catch only missing dictionary keys. When raising the AttributeError, use "from None" to suppress exception chaining and follow best practices for clearer error handling.

pyproject.toml

nnethercott · 2025-06-09T14:45:08Z

So it means this introduces a breaking-change?

Kind of a grey area tbh.

__doc was a private attribute before so users would get an AttributeError if they did a document.__doc. The issue though is that this contract wasn't really respected by the previous Document.__iter__ implementation since iter(document) returns the contents of __doc along with the other fields :(

Since this PR changes the dict representation I guess it's safer to label it as a breaking-change !

brunoocasali · 2025-06-09T16:41:12Z

Thanks a lot for your answer @nnethercott 🙏

Can you create a small migration guide, so I can inform other users within the release changelog? You can update the PR description and I copy from there.

nnethercott · 2025-06-10T08:05:30Z

@brunoocasali sure thing !

🛠 migration guide

what changed

The internal __doc field in Document has been removed to avoid data duplication.
Document.__getattr__ now correctly returns attribute values rather than the attribute name itself.

impact

Code relying on internal access to document._Document__doc (e.g. through iterators) will break.
Behavior of doc.getattr("field") is now consistent and correct.

what you need to do:

If you were manually accessing document._Document__doc, switch to using direct attribute access or casting with dict(doc).

example updated usage

doc = index.get_document("foo")
doc_dict = dict(doc) 
doc_dict.pop("_Document__doc")  # ❌ Not needed anymore

Update Document model and fix tests

2030e1b

brunoocasali reviewed Jun 9, 2025

View reviewed changes

coderabbitai bot reviewed Jun 9, 2025

View reviewed changes

brunoocasali reviewed Jun 9, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

apply coderabbit suggestions

0fdd5b8

nnethercott requested a review from brunoocasali June 10, 2025 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Document model: bad test, attribute bug, and field cleanup #1113

Fix Document model: bad test, attribute bug, and field cleanup #1113

Uh oh!

nnethercott commented Jun 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 9, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

brunoocasali left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jun 9, 2025

Uh oh!

Uh oh!

nnethercott commented Jun 9, 2025 •

edited

Loading

Uh oh!

brunoocasali commented Jun 9, 2025 •

edited

Loading

Uh oh!

nnethercott commented Jun 10, 2025

Uh oh!

Uh oh!

Fix Document model: bad test, attribute bug, and field cleanup #1113

Are you sure you want to change the base?

Fix Document model: bad test, attribute bug, and field cleanup #1113

Uh oh!

Conversation

nnethercott commented Jun 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Duplicated data

What does this PR do?

PR checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

brunoocasali left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nnethercott commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brunoocasali commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercott commented Jun 10, 2025

🛠 migration guide

what changed

impact

what you need to do:

example updated usage

Uh oh!

Uh oh!

nnethercott commented Jun 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

nnethercott commented Jun 9, 2025 •

edited

Loading

brunoocasali commented Jun 9, 2025 •

edited

Loading