-
Notifications
You must be signed in to change notification settings - Fork 82
refactor(clp-package): Simplify StrEnum and Path serialization via Annotated serializers. #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…s and update serialization.
…t directly to model_dump.
…drop custom port validators.
…ost type and removing duplicated validators.
…type and adjust serialization accordingly.
…red NonEmptyStr type.
…n scheduler configs.
…heduler configs and update references.
…ndant validators.
…remove its validator.
…el type and remove validation helpers.
…ng about plan for replacement.
WalkthroughAdds string-serialized enum and Path type aliases, new serialization utilities, replaces enum and pathlib.Path fields across config/storage models with annotated aliases, removes many custom primitive-dump methods, and updates validation/initialization and enum comparison logic. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Model as PydanticModel
participant Serializer as serialization_utils
rect rgba(135,206,250,0.08)
User->>Model: instantiate(raw input with enums/paths)
activate Model
Model->>Model: validate/normalize fields -> StrEnum / SerializablePath
Model-->>User: constructed instance
deactivate Model
end
rect rgba(144,238,144,0.07)
User->>Model: request serialization
activate Model
Model->>Serializer: serialize_str_enum / serialize_path
Serializer-->>User: primitive strings
deactivate Model
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)components/clp-py-utils/clp_py_utils/clp_config.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
672-672
: Fix broken enum comparison after type change.After changing
AwsAuthentication.type
toAwsAuthTypeStr
(an enum with string serialization), the comparison at Line 672 is now broken. The code comparesAwsAuthType.profile.value
(a string) withauth.type
(anAwsAuthType
enum object), which will always evaluate toFalse
. This breaks the validation logic that checks whether profile authentication is used.Apply this diff to fix the comparison:
- if AwsAuthType.profile.value == auth.type: + if AwsAuthType.profile == auth.type:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_config.py
(9 hunks)components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (8)
components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py (2)
6-6
: LGTM!The
StrEnumSerializer
correctly extracts the string value from enum instances for serialization, enabling clean enum-to-string conversion in Pydantic models.
8-8
: LGTM!The
PathStr
type correctly serializespathlib.Path
objects as strings, eliminating the need for custom serialization methods in models.components/clp-py-utils/clp_py_utils/clp_config.py (6)
24-24
: LGTM!The import correctly brings in the new serialization utilities that enable cleaner, more declarative model serialization.
118-118
: LGTM!The string-serialized enum type aliases correctly wrap the base enum types with
StrEnumSerializer
, ensuring enums are serialized as their string values rather than enum objects.Also applies to: 126-126, 135-135, 150-150
154-155
: LGTM!The field type updates correctly use the string-serialized enum aliases, maintaining enum type safety at runtime while enabling automatic string serialization.
Also applies to: 181-181, 369-369
373-398
: LGTM!The pre-validation hook correctly validates the authentication configuration, ensuring that required fields are set based on the authentication type and preventing invalid combinations.
415-415
: LGTM!The path fields correctly use
PathStr
to enable automatic string serialization, eliminating the need for custom serialization methods.Also applies to: 430-430, 453-453, 457-457, 461-461, 465-465, 469-469
571-571
: LGTM!The
CLPConfig
path fields correctly usePathStr
for automatic string serialization, including the private attributes.Also applies to: 577-582
import pathlib | ||
from typing import Annotated | ||
|
||
from pydantic import PlainSerializer | ||
|
||
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value) | ||
|
||
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider adding docstrings for public utilities.
Adding a module docstring and docstrings for StrEnumSerializer
and PathStr
would improve discoverability and maintainability. Consider also adding an __all__
export list to explicitly declare the public API.
Apply this diff to add documentation:
+"""
+Pydantic serialization utilities for common types.
+
+Provides annotated types and serializers to handle enum and path serialization
+without requiring custom model serialization methods.
+"""
+
import pathlib
from typing import Annotated
from pydantic import PlainSerializer
+# Serializer that converts enum instances to their string values
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value)
+# Annotated type for pathlib.Path that serializes to string
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))]
+
+__all__ = ["StrEnumSerializer", "PathStr"]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import pathlib | |
from typing import Annotated | |
from pydantic import PlainSerializer | |
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value) | |
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))] | |
""" | |
Pydantic serialization utilities for common types. | |
Provides annotated types and serializers to handle enum and path serialization | |
without requiring custom model serialization methods. | |
""" | |
import pathlib | |
from typing import Annotated | |
from pydantic import PlainSerializer | |
# Serializer that converts enum instances to their string values | |
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value) | |
# Annotated type for pathlib.Path that serializes to string | |
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))] | |
__all__ = ["StrEnumSerializer", "PathStr"] |
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/pydantic_serialization_utils.py around
lines 1 to 8, add a module-level docstring describing the purpose (helpers to
create Pydantic PlainSerializers for common types), add concise
docstrings/comments for the StrEnumSerializer and PathStr explaining what they
serialize and expected input/output (e.g., StrEnumSerializer converts Enum
instances to their .value string; PathStr serializes pathlib.Path to str), and
add an __all__ list exporting ("StrEnumSerializer", "PathStr") so the public API
is explicit; keep wording short and place docstrings immediately above the
definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the file name, how about just file_utils
? Though I understand the utils here likely are only compatible with Pydantic, the name "Pydantic" is still a bit implementation-specific which i feel can be omitted from the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can we also:
- Only export the lambda functions in
serialization_utils.py
:serialize_path = lambda ...
- Create the Pydantic types in
clp_config.py
(or move them into a separatetypes.py
source file. maybe in a future refactoring PR) e.g.,
# in clp_config.py
PathStr = Annotated[
pathlib.Path,
PlainSerializer(serialize_path)
]
so that we can decouples the serialization utils from Pydantic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel TypeStr
is not a very intuitive name. Is there a better suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with another name, but file_utils
is confusing as it is not even close to what this file is doing.
_container_image_id_path: PathStr = PrivateAttr(default=CLP_PACKAGE_CONTAINER_IMAGE_ID_PATH) | ||
_version_file_path: PathStr = PrivateAttr(default=CLP_VERSION_FILE_PATH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just to show we have considered -)
Since these are PrivateAttr
, they won't serialize by default anyway. The type change is harmless though.
|
||
StrEnumSerializer = PlainSerializer(lambda enum_value: enum_value.value) | ||
|
||
PathStr = Annotated[pathlib.Path, PlainSerializer(lambda path_value: str(path_value))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i'm not sure if this is worth doing but we shall investigate)
can we embed the validator _validate_directory
with BeforeValidator()
into the Annotation
as well?
https://docs.pydantic.dev/latest/api/functional_validators/#pydantic.functional_validators.BeforeValidator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried. Somehow annotated BeforeValidator
is not triggered. I haven't figured out why. Maybe we can ask coderabbit to open an issue to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_config.py
(10 hunks)components/clp-py-utils/clp_py_utils/serialization_utils.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
components/clp-py-utils/clp_py_utils/serialization_utils.py (2)
serialize_enum
(5-12)serialize_path
(15-22)components/clp-py-utils/clp_py_utils/core.py (1)
make_config_path_absolute
(42-53)
🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/clp_config.py
119-119: Missing return type annotation for private function _validate_directory
(ANN202)
119-119: Dynamically typed expressions (typing.Any) are disallowed in value
(ANN401)
129-129: Prefer TypeError
exception for invalid type
(TRY004)
129-129: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (14)
components/clp-py-utils/clp_py_utils/serialization_utils.py (1)
15-22
: LGTM!The implementation correctly converts a
pathlib.Path
to its string representation.components/clp-py-utils/clp_py_utils/clp_config.py (13)
9-9
: LGTM!The new imports support the Pydantic annotated serializers approach for path and enum serialization.
Also applies to: 14-14, 26-26
114-116
: Clean type alias design.The
StrEnumSerializer
andPathStr
type aliases provide a consistent, reusable pattern for serialization. Note that these depend on theserialize_enum
function being fixed (see the critical bug identified inserialization_utils.py
).
147-147
: LGTM!The enum type aliases consistently apply the
StrEnumSerializer
for automatic string serialization across all enum types.Also applies to: 155-155, 164-164, 179-179
183-184
: LGTM!The Package model correctly uses the string-serialized enum type aliases for automatic serialization.
210-210
: LGTM!The Database model correctly uses
DatabaseEngineStr
for automatic string serialization.
398-398
: LGTM!The AwsAuthentication model correctly uses
AwsAuthTypeStr
for automatic string serialization.
444-444
: LGTM!Using
ValidatedPathStr
ensures the directory path is validated before being converted topathlib.Path
.
453-453
: LGTM!Using
ValidatedPathStr
ensures the staging directory path is validated before being converted topathlib.Path
.
470-470
: Verify intentional removal of directory validation.This line changes from
ValidatedPathStr
(inherited from parentFsStorage
) toPathStr
, which removes the validation that ensures the value is a non-empty string. Given the default value ispathlib.Path("/")
, this change may be intentional to allow root directory ingestion without validation constraints.Please confirm this is intentional. If validation should be retained, consider using
ValidatedPathStr
instead:- directory: PathStr = pathlib.Path("/") + directory: ValidatedPathStr = pathlib.Path("/")
474-474
: LGTM!Using
PathStr
without validation is appropriate here since these fields have sensible defaults and validation happens later viamake_config_paths_absolute
and related methods.Also applies to: 478-478, 482-482, 486-486
588-588
: LGTM!Changing these fields to
PathStr
ensures consistent path serialization across the configuration model.Also applies to: 594-595
598-599
: Type change is harmless for private attributes.As noted in previous reviews, since these are
PrivateAttr
, they won't serialize by default anyway. The type change toPathStr
is harmless and maintains consistency.Based on learnings
689-689
: LGTM!The comparison
auth.type == AwsAuthType.profile
works correctly even thoughauth.type
is nowAwsAuthTypeStr
, as Pydantic preserves the underlying enum for comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
765-777
: Address static analysis hints for better code quality.The _validate_directory function should be improved per the static analysis hints already flagged in previous reviews:
- Add return type annotation (-> str)
- Use TypeError instead of ValueError when validating the type at line 774
- Consider extracting exception messages into constants
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_config.py
(11 hunks)components/clp-py-utils/clp_py_utils/serialization_utils.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/serialization_utils.py (2)
serialize_enum
(5-12)serialize_path
(15-22)
🪛 GitHub Actions: clp-lint
components/clp-py-utils/clp_py_utils/serialization_utils.py
[error] 1-1: ruff: Import block is un-sorted or un-formatted (I001). 1 fixable with --fix.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: package-image
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (8)
components/clp-py-utils/clp_py_utils/serialization_utils.py (2)
5-12
: LGTM! Bug fix applied correctly.The function now correctly accesses
enum_value.value
as a property rather than calling it as a method.
15-22
: LGTM!The implementation correctly converts pathlib.Path to its string representation.
components/clp-py-utils/clp_py_utils/clp_config.py (6)
13-13
: LGTM!The imports are correctly added to support the annotated serializer refactoring.
Also applies to: 25-25
113-115
: LGTM!The reusable type aliases for StrEnumSerializer and PathStr are well-defined and will reduce duplication across model fields.
123-124
: LGTM!The string-serialized enum type aliases follow a consistent pattern and will ensure enums are properly serialized to their string values.
Also applies to: 131-132, 140-141, 155-156
159-160
: LGTM!The model fields are correctly updated to use string-serialized enum types, which will automatically handle serialization without custom dump_to_primitive_dict methods.
Also applies to: 186-186, 374-374
420-420
: LGTM!The path fields are correctly updated to use PathStr, which will automatically serialize paths to strings via the PlainSerializer.
Also applies to: 435-435, 458-458, 462-462, 466-466, 470-470, 474-474, 576-576, 582-583
677-677
: LGTM!The enum comparison is still correct. The AwsAuthTypeStr annotation only affects serialization; at runtime, auth.type remains an AwsAuthType instance, so the comparison works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm
for the title, how about:
refactor(clp-package): Simplify StrEnum and Path serialization via Annotated serializers.
The "A" in "Annotated" is intentionally capitalized because it's a Python feature
return connection_params_and_type | ||
|
||
def dump_to_primitive_dict(self): | ||
d = self.model_dump(exclude={"username", "password"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a separate PR, we shall investigate
specifying private attributes like username
as https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.PrivateAttr
and
specifying secrets like password
as https://docs.pydantic.dev/2.2/usage/types/secrets/
so we can completely get rid of such dump_to_primitive_dict()
methods.
@coderabbitai create an issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the investigation
gh issue create \
--repo y-scope/clp \
--title "Investigate using Pydantic PrivateAttr and SecretStr to eliminate dump_to_primitive_dict methods" \
--body "## Context
This issue tracks a follow-up refactoring task identified during PR #1417.
## Task
Investigate using Pydantic features to eliminate the need for custom \`dump_to_primitive_dict()\` methods:
1. Use [PrivateAttr](https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.PrivateAttr) for private attributes like \`username\`
2. Use [SecretStr](https://docs.pydantic.dev/2.2/usage/types/secrets/) for sensitive fields like \`password\`
This would allow complete removal of \`dump_to_primitive_dict()\` methods throughout the codebase.
## References
- PR: https://github.com/y-scope/clp/pull/1417
- Comment: https://github.com/y-scope/clp/pull/1417#discussion_r2442428522
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao I've created issue #1440 to track this investigation: #1440
The issue covers exploring Pydantic's PrivateAttr
for private attributes like username
and SecretStr
for sensitive fields like password
, which would allow the removal of dump_to_primitive_dict()
methods throughout the codebase.
🧬 Code Graph Analysis Results
Filename: components/clp-py-utils/clp_py_utils/clp_config.py
Line numbers: N/A (summary)
-
Summary: Private attributes used for config paths
- _container_image_id_path: PrivateAttr with default path CLP_PACKAGE_CONTAINER_IMAGE_ID_PATH
- _version_file_path: PrivateAttr with default path CLP_VERSION_FILE_PATH
- Purpose: Conceal internal paths from model serialization; supports lazy/controlled access for container image and version resolution.
-
Summary: dump_to_primitive_dict methods (exclusions to avoid leaking sensitive data)
- Database.dump_to_primitive_dict
- Returns: dict representation of the Database model excluding username and password
- Behavior: Ensures credentials are not dumped to primitive form
- Redis.dump_to_primitive_dict
- Returns: dict representation of the Redis model excluding password
- Queue.dump_to_primitive_dict
- Returns: dict representation of the Queue model excluding username and password
- CLPConfig.dump_to_primitive_dict
- Returns: composite dict where database, queue, and redis are dumped via their respective dump_to_primitive_dict methods
- Also converts aws_config_directory to a primitive value (string or None)
- Purpose: Centralized primitive-dict export that omits sensitive credentials
- Database.dump_to_primitive_dict
-
Summary: Related credential loading methods (context for removal of dump methods)
- Database.load_credentials_from_file/load_credentials_from_env
- Redis.load_credentials_from_file/load_credentials_from_env
- Queue.load_credentials_from_file/load_credentials_from_env
- Note: These methods indicate sensitivity of credential data, motivating the move to private attributes and excluding credentials in primitive dumps.
_version_file_path: PathStr = PrivateAttr(default=CLP_VERSION_FILE_PATH) | ||
|
||
@field_validator("aws_config_directory") | ||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we changed the above aws_config_directory
type as Optional[SerializablePath]
, we shall update below value
type as Optional[SerializablePath]
in below validator prototype for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked other references of pathlib.Path
and believe they are fine to be left as pathlib.Path
because those are used in public interfaces and the callers do not need to make the paths serializable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we have to manually serialize them in the dump_to_primitive_dict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Sorry if i may have confused you, i meant to say
I checked other references of
pathlib.Path
and believe "all the other references" are fine to be left aspathlib.Path
-> the value
argument in the validator of aws_config_directory
is the ONLY argument whose type should be updated as SerializablePath
-> that is, let's not change the type from pathlib.Path
to SerializablePath
in any def make_config_paths_absolute(self, clp_home: pathlib.Path)
or def load_credentials_from_file(self, credentials_file_path: pathlib.Path)
prototypes
Co-authored-by: Junhao Liao <junhao@junhao.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
590-594
: Update validator parameter type for consistency.If
aws_config_directory
is changed toPathStr
(as required to fix line 584), consider updating the validator'svalue
parameter type fromOptional[pathlib.Path]
toOptional[PathStr]
for consistency.Based on learnings
@field_validator("aws_config_directory") @classmethod - def expand_profile_user_home(cls, value: Optional[pathlib.Path]): + def expand_profile_user_home(cls, value: Optional[PathStr]): if value is not None: value = value.expanduser() return value
717-733
: Consider removing redundant custom serialization.If
aws_config_directory
is changed toPathStr
as suggested above, lines 728-731 become redundant because the field will already serialize to a string automatically via thePlainSerializer
. The custom serialization could then be removed.Based on learnings
After fixing the type at line 584, apply this diff to remove the redundant serialization:
def dump_to_primitive_dict(self): custom_serialized_fields = { "database", "queue", "redis", } d = self.model_dump(exclude=custom_serialized_fields) for key in custom_serialized_fields: d[key] = getattr(self, key).dump_to_primitive_dict() - - # Turn paths into primitive strings - if self.aws_config_directory is not None: - d["aws_config_directory"] = str(self.aws_config_directory) - else: - d["aws_config_directory"] = None return d
♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
113-115
: Consider renamingPathStr
toSerializablePath
for clarity.The name
PathStr
is potentially misleading since the annotated type ispathlib.Path
, notstr
. ThePlainSerializer
only affects serialization behaviour, not the runtime type. Consider renaming toSerializablePath
to better reflect that it's a serializable Path type.Based on learnings
Apply this diff if you agree with the renaming:
-StrEnumSerializer = PlainSerializer(serialize_enum) - -PathStr = Annotated[pathlib.Path, PlainSerializer(serialize_path)] +StrEnumSerializer = PlainSerializer(serialize_enum) + +SerializablePath = Annotated[pathlib.Path, PlainSerializer(serialize_path)]Note: This would require updating all usages of
PathStr
throughout the file toSerializablePath
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_config.py
(11 hunks)components/clp-py-utils/clp_py_utils/serialization_utils.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-24T17:10:44.565Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: taskfiles/lint.yaml:780-781
Timestamp: 2025-08-24T17:10:44.565Z
Learning: The CLP repository is gradually migrating from Black to Ruff for Python formatting. Some modules haven't been set up for Ruff yet, so there's intentional mixed usage of Black and Ruff formatters during this transition period. This should not be flagged as an inconsistency in PR reviews.
Applied to files:
components/clp-py-utils/clp_py_utils/serialization_utils.py
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/serialization_utils.py (2)
serialize_enum
(6-13)serialize_path
(16-23)
🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/clp_config.py
584-584: Undefined name SerializablePath
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
🔇 Additional comments (11)
components/clp-py-utils/clp_py_utils/clp_config.py (9)
13-13
: LGTM!The
PlainSerializer
import is correctly added to support the new annotated serializers.
25-25
: LGTM!The import of serialization utilities is correct and necessary for the new type aliases.
123-123
: LGTM!The string-serialized enum type aliases are correctly defined using the
StrEnumSerializer
and follow a consistent naming pattern.Also applies to: 131-131, 140-140, 155-155
159-160
: LGTM!The field type changes to use the new string-serialized enum aliases are correct and will enable automatic serialization without custom methods.
Also applies to: 186-186, 374-374
420-420
: LGTM!The path fields are correctly updated to use the
PathStr
alias, enabling automatic string serialization.Also applies to: 435-435, 458-458, 462-462, 466-466, 470-470, 474-474
576-576
: LGTM!The CLPConfig path fields are correctly updated to use the
PathStr
alias for automatic serialization.Also applies to: 582-583
586-587
: LGTM!The private attributes are correctly typed with
PathStr
. As noted in past comments, since these arePrivateAttr
, they won't serialize by default anyway, making the type change harmless.Based on learnings
677-677
: LGTM!Direct access to
auth.type
is correct. Sincetype
is nowAwsAuthTypeStr
(which serializes to string), the comparison withAwsAuthType.profile
will work correctly as the underlying value is compared.
749-749
: Consider updating field type toSerializablePath
.Based on past review comments, consider whether
data_directory
inWorkerConfig
should also useSerializablePath
(orPathStr
) for consistency with the changes inCLPConfig
.Based on learnings
Should this field be updated to match the serialization approach used in
CLPConfig
?components/clp-py-utils/clp_py_utils/serialization_utils.py (2)
1-3
: LGTM!The imports are correctly formatted with proper spacing between standard library and third-party imports.
16-23
: LGTM!The
serialize_path
function is correctly implemented with accurate docstring documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_config.py
(12 hunks)components/clp-py-utils/clp_py_utils/serialization_utils.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-24T17:10:44.565Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: taskfiles/lint.yaml:780-781
Timestamp: 2025-08-24T17:10:44.565Z
Learning: The CLP repository is gradually migrating from Black to Ruff for Python formatting. Some modules haven't been set up for Ruff yet, so there's intentional mixed usage of Black and Ruff formatters during this transition period. This should not be flagged as an inconsistency in PR reviews.
Applied to files:
components/clp-py-utils/clp_py_utils/serialization_utils.py
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/serialization_utils.py (2)
serialize_path
(16-23)serialize_str_enum
(6-13)
🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/clp_config.py
592-592: Missing return type annotation for classmethod expand_profile_user_home
(ANN206)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
13-13
: LGTM: Serialization infrastructure properly set up.The imports and serializer definitions are correct. The
StrEnumSerializer
andSerializablePath
type aliases properly leverage the new serialization utilities to enable automatic string serialization of StrEnum and Path fields in Pydantic models.Also applies to: 25-25, 103-114
122-122
: LGTM: String-serialized enum aliases are well-defined.The enum aliases (
StorageEngineStr
,DatabaseEngineStr
,QueryEngineStr
,AwsAuthTypeStr
) follow a consistent naming convention and properly apply theStrEnumSerializer
to enable automatic string serialization whilst preserving enum type safety during validation.Also applies to: 130-130, 139-139, 154-154
158-159
: LGTM: Model field types correctly updated to use serializable aliases.The field type updates are consistent and correct:
- Enum fields now use string-serialized aliases (
StorageEngineStr
,DatabaseEngineStr
,QueryEngineStr
,AwsAuthTypeStr
)- Path fields now use
SerializablePath
alias for automatic string serialization- Private attributes also properly use
SerializablePath
This enables automatic serialization without requiring custom
dump_to_primitive_dict
methods for these fields.Also applies to: 185-185, 373-373, 419-419, 434-434, 457-457, 461-461, 465-465, 469-469, 473-473, 575-575, 581-583, 585-588
666-694
: LGTM: Validation logic correctly uses enum object comparison.The validation logic at line 678 correctly compares enum objects directly (
AwsAuthType.profile == auth.type
) rather than comparing string values. This is the proper approach when using string-serialized enum aliases, as the runtime type remains an enum object with automatic serialization behavior.
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Description
This PR adds pydantic annotated serializer for
pathlib.Path
andStrEnum
to avoid adding custom model serialization for fields of these types.Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Improvements
New Features
Refactor