Skip to content

Conversation

Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Jul 14, 2025

Description

As the CLP codebase grows, C++ unit tests alone are no longer sufficient. We now need integration tests that validate the correctness of the entire codebase and to ensure components work correctly together. This not only helps developers catch issues earlier but also reduces review overhead.

This PR establishes a foundation for future, more comprehensive integration testing. Specifically, it introduces:

  • Integration test setup with pytest
  • Linting via black, ruff, and mypy
  • Environment management with uv
  • Taskfile workflows specific to integration tests

Within the pytest setup, this PR adds:

  • A pytest.ini with default options, logging, and markers
  • Facilities for downloading and storing test logs
  • Configuration classes for CLP packages and binaries
  • Example tests for CLP core binaries
  • Utility functions to support future tests

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Integration tests pass with the current main branch codebase and selected test logs.

Summary by CodeRabbit

  • New Features

    • Added an integration-tests project with end-to-end identity‑transformation tests, fixtures, frozen config objects and test utilities.
  • Tests

    • Added pytest configuration, pinned Python 3.9 for the test workspace, test tasks/entries and markers, plus helpers to run commands, assert results, and compare directories/JSON.
  • Documentation

    • Added testing docs and an integration‑tests README; documented uv as a dependency.
  • Chores

    • Enhanced lint workflow with per‑project linting and task additions; updated .gitignore to exclude pycache/.

Copy link
Contributor

coderabbitai bot commented Jul 14, 2025

Walkthrough

Adds a new integration-tests Python project (pytest config, fixtures, utils, and tests), taskfile and lint-task updates for per-project linting and integration test orchestration, documentation for testing, and a .gitignore entry for __pycache__/. All changes are non-API except new test helpers and frozen dataclasses used by tests.

Changes

Cohort / File(s) Summary of changes
VCS ignore update
\.gitignore
Added __pycache__/ ignore entry.
Integration project metadata & config
integration-tests/pyproject.toml, integration-tests/.pytest.ini, integration-tests/.python-version, integration-tests/README.md
New Python project config (hatchling), pinned Python 3.9, dev deps (mypy/ruff/pytest/pytest-env), mypy/ruff settings, pytest logging/markers/env, and README.
Package & test package inits
integration-tests/src/integration_tests/__init__.py, integration-tests/tests/__init__.py, integration-tests/tests/fixtures/__init__.py, integration-tests/tests/utils/__init__.py
Added module/package docstrings to establish namespaces.
Pytest wiring
integration-tests/tests/conftest.py
Registered pytest plugins via pytest_plugins.
Fixtures: config & datasets
integration-tests/tests/fixtures/integration_test_config.py, integration-tests/tests/fixtures/integration_test_logs.py
Added session-scoped integration_test_config fixture (core/package/build paths) and dataset fixtures (hive_24hr, postgresql) that download/extract tarballs and use pytest cache.
Test utilities — configs
integration-tests/tests/utils/config.py
Added frozen dataclasses: CoreConfig, PackageConfig, IntegrationTestConfig, IntegrationTestLogs, CompressionTestConfig with validation, computed paths, and clear_test_outputs.
Test utilities — helpers
integration-tests/tests/utils/utils.py, integration-tests/tests/utils/asserting_utils.py
Added helpers: get_env_var, validate_dir_exists, unlink, is_dir_tree_content_equal, is_json_file_structurally_equal (uses jq + sort), and run_and_assert (subprocess wrapper that fails tests on error).
Tests
integration-tests/tests/test_identity_transformation.py
Added identity-transformation tests exercising clp (text) and clp-s (JSON) compress/decompress flows, including consolidation and structural JSON comparison.
Taskfile includes & test tasks
taskfile.yaml, taskfiles/tests/main.yaml, taskfiles/tests/integration.yaml
Included test taskfile and added integration test tasks (default, core, cache-clear) and G_INTEGRATION_TESTS_DIR; clean now calls tests:integration:cache-clear.
Lint task updates
taskfiles/lint.yaml
Split RUFF_FLAGS into RUFF_CHECK_FLAGS and RUFF_FORMAT_FLAGS; added UV_PYTHON_PROJECTS (includes integration-tests); per-project ruff format/check and mypy runs; adjusted fix-py invocation.
Documentation & navigation
docs/src/dev-docs/building-package.md, docs/src/dev-docs/components-core/index.md, docs/src/dev-docs/index.md, docs/src/dev-docs/testing/index.md, docs/src/dev-docs/testing/integration-tests.md, docs/src/dev-docs/testing/unit-tests.md
Added testing docs, navigation card and toctree, noted uv as a dependency, and documented running integration tests and related environment variables.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant UV as uv (task runner)
  participant Pytest as pytest
  participant Fixtures as Fixtures
  participant Utils as TestUtils
  participant Bins as CLP binaries

  Dev->>UV: task tests:integration
  UV->>Pytest: run pytest in integration-tests (env vars set)
  Pytest->>Fixtures: load integration_test_config (session)
  Fixtures->>Utils: get_env_var(), validate_dir_exists()
  Pytest->>Fixtures: load dataset fixture (download & extract if needed)
  Fixtures->>Utils: unlink / prepare test dirs
  Pytest->>Bins: invoke clp / clp-s (compress/decompress)
  Bins-->>Pytest: outputs (archives, decompressed dirs)
  Pytest->>Utils: is_dir_tree_content_equal / is_json_file_structurally_equal
  Utils-->>Pytest: comparison result
  Pytest-->>Dev: test results
  note right of Pytest: clp-s flow includes consolidation before structural JSON compare
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately and succinctly summarizes the primary change: adding integration-test infrastructure and basic tests for the clp and clp-s compressors, and it follows Conventional Commits style; it is specific rather than generic so a reviewer scanning history will understand the main intent.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31e69e3 and f424f21.

📒 Files selected for processing (1)
  • integration-tests/tests/fixtures/integration_test_logs.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
PR: y-scope/clp#1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-20T09:01:25.825Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:60-71
Timestamp: 2025-08-20T09:01:25.825Z
Learning: For integration tests using curl downloads in PR #1100, curl's built-in timeout and retry mechanisms (with --fail and --show-error flags) are sufficient, and additional subprocess timeout parameters are not needed.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
🧬 Code graph analysis (1)
integration-tests/tests/fixtures/integration_test_logs.py (3)
integration-tests/tests/utils/config.py (2)
  • IntegrationTestConfig (74-95)
  • IntegrationTestLogs (99-123)
integration-tests/tests/utils/utils.py (1)
  • unlink (54-75)
integration-tests/tests/fixtures/integration_test_config.py (1)
  • integration_test_config (16-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (4)
integration-tests/tests/fixtures/integration_test_logs.py (4)

18-29: Fixture looks good (session-scoped, typed, consistent naming).

No issues spotted; this cleanly delegates to the helper and follows the “logs” naming convention.


32-43: Second dataset fixture is consistent and correct.

Matches the first fixture’s pattern; scoped and typed appropriately.


61-65: Good fail‑fast check for curl availability.

Resolving the binary with shutil.which and raising a clear error is the right approach.


79-79: Confirm unlink’s sudo fallback won’t break CI/Windows.

tests.utils.utils.unlink escalates to “sudo rm -rf” on PermissionError. If Windows runners or non‑sudo CI are in scope, this can fail.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.12.2)
integration-tests/tests/fixtures/integration_test_logs.py

�[1;31mruff failed�[0m
�[1mCause:�[0m Failed to load configuration /ruff.toml
�[1mCause:�[0m Failed to parse /ruff.toml
�[1mCause:�[0m TOML parse error at line 26, column 3
|
26 | "RSE100", # Use of assert detected
| ^^^^^^^^
Unknown rule selector: RSE100

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Bill-hbrhbr Bill-hbrhbr changed the title feat(integration-tests): Add CLP package integration tests boilerplate. feat(integration-tests): Add CLP package integration tests boilerplate; Add task workflow to for running the integration test module. Jul 14, 2025
@Bill-hbrhbr Bill-hbrhbr marked this pull request as ready for review July 14, 2025 17:51
@Bill-hbrhbr Bill-hbrhbr requested a review from a team as a code owner July 14, 2025 17:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe78f4f and 6880a0f.

⛔ Files ignored due to path filters (1)
  • integration-tests/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • .gitignore (1 hunks)
  • docs/src/dev-guide/building-package.md (3 hunks)
  • integration-tests/.pytest.ini (1 hunks)
  • integration-tests/.python-version (1 hunks)
  • integration-tests/pyproject.toml (1 hunks)
  • integration-tests/src/integration_tests/__init__.py (1 hunks)
  • integration-tests/tests/conftest.py (1 hunks)
  • integration-tests/tests/fixtures/base_config.py (1 hunks)
  • integration-tests/tests/fixtures/dataset_logs.py (1 hunks)
  • integration-tests/tests/test_identity_transformation.py (1 hunks)
  • integration-tests/tests/utils/config.py (1 hunks)
  • integration-tests/tests/utils/utils.py (1 hunks)
  • taskfile.yaml (2 hunks)
  • taskfiles/lint.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
docs/src/dev-guide/building-package.md (6)
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: quinntaylormitchell
PR: y-scope/clp#1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
integration-tests/.pytest.ini (2)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
integration-tests/tests/test_identity_transformation.py (5)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
🧬 Code Graph Analysis (1)
integration-tests/tests/conftest.py (2)
integration-tests/tests/fixtures/base_config.py (1)
  • base_config (9-22)
integration-tests/tests/fixtures/dataset_logs.py (4)
  • download_and_extract_dataset (44-71)
  • hive_24hr (12-16)
  • postgresql (36-40)
  • spark_event_logs (28-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (20)
.gitignore (1)

5-5: LGTM! Standard Python cache directory exclusion.

The recursive **/__pycache__/ pattern is appropriate for the new integration tests infrastructure and follows Python project conventions.

integration-tests/tests/utils/config.py (1)

1-18: LGTM! Well-structured configuration classes.

The frozen dataclasses provide clean, immutable configuration structures with proper type hints. The use of Path objects for directory paths and clear field names makes the code maintainable.

integration-tests/tests/utils/utils.py (2)

16-20: LGTM! Clean environment variable retrieval.

The function provides proper error handling with a descriptive assertion message.


22-26: LGTM! Simple and effective command execution wrapper.

The function provides a clean abstraction for running commands with assertion-based error handling.

integration-tests/.python-version (1)

1-1: LGTM! Appropriate Python version specification.

Python 3.10 is a good choice for the integration tests and aligns with the project's requirements.

docs/src/dev-guide/building-package.md (3)

18-18: LGTM! Proper dependency documentation.

The uv requirement is clearly documented with the appropriate version constraint.


65-72: LGTM! Clear testing instructions.

The Test section provides concise, actionable instructions for running integration tests that align with the new infrastructure.


83-83: LGTM! Proper reference link.

The uv reference link is correctly formatted and provides users with additional documentation.

taskfiles/lint.yaml (1)

779-779: LGTM!

Adding the integration-tests directory to the linting process is correct and maintains code quality consistency across the project.

integration-tests/tests/conftest.py (1)

1-7: LGTM!

The conftest.py file correctly aggregates pytest fixtures following standard pytest practices. The imports are consistent with the fixture modules described in the context.

integration-tests/.pytest.ini (2)

10-11: Verify that relative paths work consistently across different execution contexts.

The environment variables use relative paths (../build). Ensure these paths resolve correctly regardless of where the tests are executed from (e.g., from the root directory, from the integration-tests directory, or from CI/CD pipelines).


12-15: Well-defined test markers.

The custom markers provide clear categorization for different types of integration tests, which will be helpful for selective test execution.

integration-tests/tests/fixtures/base_config.py (2)

8-9: Appropriate fixture scope.

Using session scope for the base_config fixture is correct since this configuration should be shared across all tests in the session.


20-21: Robust directory creation.

The directory creation logic correctly uses parents=True and exist_ok=True to handle cases where directories may already exist or need to be created recursively.

integration-tests/pyproject.toml (1)

1-23: LGTM! Well-structured Python project configuration.

The pyproject.toml file is correctly configured with appropriate metadata, build system, and development dependencies for the integration test suite.

taskfile.yaml (2)

18-18: LGTM! Consistent variable naming convention.

The new variable follows the established naming pattern with G_ prefix for global variables.


166-172: LGTM! Well-structured integration test task.

The task properly:

  • Sets required environment variables for the test suite
  • Uses uv run for isolated Python execution
  • Runs from the correct directory context

The environment variables CLP_BUILD_DIR and CLP_PACKAGE_DIR will be available to the tests for locating binaries and packages.

integration-tests/tests/fixtures/dataset_logs.py (1)

11-40: LGTM! Well-structured dataset fixtures.

The fixtures correctly define session-scoped datasets with clear naming and URLs pointing to Zenodo archives.

integration-tests/tests/test_identity_transformation.py (2)

36-67: LGTM! Well-structured identity transformation test for CLP.

The test correctly implements the identity transformation pattern:

  • Proper cleanup before and after test execution
  • Uses appropriate CLI flags (--progress, --remove-path-prefix)
  • Validates that decompressed output matches original input
  • Follows established cleanup patterns from learned context

69-108: LGTM! Comprehensive identity transformation test for CLP-S.

The test properly handles CLP-S specific requirements:

  • Implements double compression/decompression cycle for consistency verification
  • Accounts for JSON key and row order limitations as documented
  • Includes appropriate TODO comment referencing official documentation
  • Proper cleanup of all temporary directories

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6880a0f and c24cdaf.

📒 Files selected for processing (2)
  • taskfile.yaml (2 hunks)
  • taskfiles/lint.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (2)
taskfiles/lint.yaml (1)

777-778: Integration-tests directory correctly added to Python lint scope

Including integration-tests ensures Black/Ruff run over the new test code. No issues spotted.

taskfile.yaml (1)

18-18: Variable addition looks good

G_INTEGRATION_TESTS_DIR follows existing naming conventions and will simplify future path changes.

Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comment. I will need to take a closer look at uv and pytest to fully review this.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
integration-tests/tests/fixtures/dataset_logs.py (1)

44-73: Address resource cleanup and error handling improvements.

The download function has several areas that could be improved based on past review feedback:

  1. Resource cleanup: Downloaded archives are not cleaned up after extraction
  2. Error handling: The broad except Exception clause could mask specific issues
  3. Security: Consider validating HTTPS URLs for the datasets

Consider this improvement:

    try:
        download_path_str = str(package_config.uncompressed_logs_dir / f"{dataset_name}.tar.gz")
        # fmt: off
        curl_cmds = [
            "curl",
            "--fail",
            "--location",
            "--output", download_path_str,
            "--show-error",
            dataset_config.tar_url,
        ]
        # fmt: on
        subprocess.run(curl_cmds, check=True)

        extract_path_str = str(package_config.uncompressed_logs_dir / dataset_name)
        shutil.unpack_archive(download_path_str, extract_path_str)
+        # Clean up downloaded archive
+        os.unlink(download_path_str)
-    except Exception as e:
+    except (subprocess.CalledProcessError, shutil.ReadError, OSError) as e:
        raise RuntimeError(f"Failed to download and extract dataset `{dataset_name}`: {e}")

This addresses resource cleanup and provides more specific error handling.

integration-tests/tests/test_identity_transformation.py (2)

126-126: Remove duplicate assignment that causes resource leak

Line 126 reassigns keys_and_rows_sorted, overwriting the context-managed temporary file created on line 125. This causes a resource leak and breaks the intended context management.

Remove this line:

-        keys_and_rows_sorted = NamedTemporaryFile(mode="w+", delete=True)

96-99: Inconsistent naming: "single-file" directories without single-file-archive mode

The variable names single_file_archives_dir and single_file_extract_dir suggest single-file archive mode, but the compression commands don't use the --single-file-archive flag. This creates confusion about the test's intent.

Either rename the variables to remove "single-file" references or add the flag if single-file mode is intended.

Also applies to: 104-105

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9587f99 and 6d73f31.

📒 Files selected for processing (6)
  • integration-tests/.pytest.ini (1 hunks)
  • integration-tests/tests/conftest.py (1 hunks)
  • integration-tests/tests/fixtures/dataset_logs.py (1 hunks)
  • integration-tests/tests/test_identity_transformation.py (1 hunks)
  • integration-tests/tests/utils/utils.py (1 hunks)
  • taskfile.yaml (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
integration-tests/.pytest.ini (2)

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp, the success of constructor.store() is verified through REQUIRE statements and subsequent comparisons.

taskfile.yaml (7)

Learnt from: anlowee
PR: #925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like apt-get install -f) when installing packages to give users more control over their system.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

integration-tests/tests/utils/utils.py (1)

Learnt from: anlowee
PR: #925
File: taskfile.yaml:132-142
Timestamp: 2025-05-27T19:54:46.662Z
Learning: When checking for uncommitted changes in generated code directories, git status --porcelain is more comprehensive than git diff --quiet because it detects both modified tracked files and new untracked files, while git diff only detects changes to already-tracked files.

integration-tests/tests/test_identity_transformation.py (20)

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.

Learnt from: Bill-hbrhbr
PR: #1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.

Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.

Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add, apt update && apt install, yum update && yum install) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp, the success of constructor.store() is verified through REQUIRE statements and subsequent comparisons.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: haiqi96
PR: #634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In fs_compression_task.py, when refactoring the archive processing loop in Python, the src_archive_file.unlink() operation should remain outside of the process_archive function.

Learnt from: haiqi96
PR: #619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ FileDecompressor implementations at components/core/src/clp/clp/FileDecompressor.cpp and components/core/src/glt/glt/FileDecompressor.cpp, the temp_output_path variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Learnt from: haiqi96
PR: #651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/ directory, keep them simple and avoid adding additional validation code.

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.

integration-tests/tests/fixtures/dataset_logs.py (2)

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.

🧬 Code Graph Analysis (2)
integration-tests/tests/conftest.py (2)
integration-tests/tests/fixtures/dataset_logs.py (4)
  • download_and_extract_dataset (45-73)
  • hive_24hr (13-17)
  • postgresql (37-41)
  • spark_event_logs (29-33)
integration-tests/tests/fixtures/package_config.py (1)
  • package_config (9-22)
integration-tests/tests/test_identity_transformation.py (4)
integration-tests/tests/utils/config.py (2)
  • DatasetLogs (15-17)
  • PackageConfig (6-11)
integration-tests/tests/utils/utils.py (2)
  • is_dir_tree_content_equal (7-14)
  • run_and_assert (23-26)
integration-tests/tests/fixtures/package_config.py (1)
  • package_config (9-22)
integration-tests/tests/fixtures/dataset_logs.py (1)
  • download_and_extract_dataset (45-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (10)
integration-tests/tests/utils/utils.py (1)

17-20: Clean implementation with helpful error messaging.

The function properly handles missing environment variables with a clear assertion message.

integration-tests/tests/conftest.py (1)

1-7: Standard pytest fixture aggregation pattern.

Clean implementation that properly imports and exposes fixtures for the integration test suite.

integration-tests/.pytest.ini (1)

1-15: Well-structured pytest configuration for integration tests.

The configuration properly sets up environment variables, custom markers, and test execution options. The markers for binaries, clp, and clp_s provide good test categorization capabilities.

taskfile.yaml (3)

18-18: Integration tests directory variable follows existing naming convention.

The variable name and path structure are consistent with other directory variables in the file.


166-174: Well-implemented test task with proper dependency.

The task correctly depends on the package task and properly sets environment variables that match the pytest configuration. This addresses the dependency issue mentioned in past reviews.


176-181: Useful cache clearing task for test maintenance.

The cache clearing task provides a clean way to reset test state, which is valuable for debugging and ensuring clean test runs.

integration-tests/tests/fixtures/dataset_logs.py (4)

12-17: Dataset fixture follows consistent pattern.

The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.


20-25: Dataset fixture follows consistent pattern.

The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.


28-33: Dataset fixture follows consistent pattern.

The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.


36-41: Dataset fixture follows consistent pattern.

The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.

@Bill-hbrhbr Bill-hbrhbr requested a review from haiqi96 July 24, 2025 02:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
integration-tests/tests/fixtures/dataset_logs.py (1)

52-84: Improve error handling specificity and add archive cleanup.

The download and extraction logic has areas for improvement:

  1. The bare except Exception clause is too broad and could mask unexpected errors
  2. Downloaded archives are not cleaned up after extraction, potentially wasting disk space

Apply this diff to improve error handling and add cleanup:

-    except Exception as e:
-        raise RuntimeError(f"Failed to download and extract dataset `{dataset_name}`: {e}")
+    except subprocess.CalledProcessError as e:
+        raise RuntimeError(f"Failed to download dataset `{dataset_name}`: {e}")
+    except shutil.ReadError as e:
+        raise RuntimeError(f"Failed to extract dataset `{dataset_name}`: {e}")
+    finally:
+        # Clean up downloaded archive to save disk space
+        download_path = package_config.uncompressed_logs_dir / f"{dataset_name}.tar.gz"
+        download_path.unlink(missing_ok=True)
integration-tests/tests/test_identity_transformation.py (1)

127-139: Fix redundant NamedTemporaryFile creation causing resource leak.

Line 132 creates a new NamedTemporaryFile that shadows the one created in the with statement on line 131, causing a resource leak and incorrect behavior.

Remove the redundant line:

 @contextmanager
 def _sort_json_keys_and_rows(json_fp: Path) -> IO[str]:
     with NamedTemporaryFile(mode="w+", delete=True) as keys_sorted, NamedTemporaryFile(
         mode="w+", delete=True
     ) as flattened, NamedTemporaryFile(mode="w+", delete=True) as keys_and_rows_sorted:
-        keys_and_rows_sorted = NamedTemporaryFile(mode="w+", delete=True)
         subprocess.run(["jq", "--sort-keys", ".", str(json_fp)], check=True, stdout=keys_sorted)
         keys_sorted.flush()
         subprocess.run(["jq", ".", keys_sorted.name], check=True, stdout=flattened)
         flattened.flush()
         subprocess.run(["sort", flattened.name], check=True, stdout=keys_and_rows_sorted)
         keys_and_rows_sorted.flush()
         yield keys_and_rows_sorted
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d73f31 and 6216e68.

📒 Files selected for processing (3)
  • integration-tests/tests/conftest.py (1 hunks)
  • integration-tests/tests/fixtures/dataset_logs.py (1 hunks)
  • integration-tests/tests/test_identity_transformation.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
integration-tests/tests/fixtures/dataset_logs.py (2)

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.

integration-tests/tests/test_identity_transformation.py (20)

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.

Learnt from: haiqi96
PR: #651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add, apt update && apt install, yum update && yum install) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Learnt from: Bill-hbrhbr
PR: #1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.

Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp, the success of constructor.store() is verified through REQUIRE statements and subsequent comparisons.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp, the run_serializer function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: haiqi96
PR: #634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In fs_compression_task.py, when refactoring the archive processing loop in Python, the src_archive_file.unlink() operation should remain outside of the process_archive function.

Learnt from: haiqi96
PR: #619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ FileDecompressor implementations at components/core/src/clp/clp/FileDecompressor.cpp and components/core/src/glt/glt/FileDecompressor.cpp, the temp_output_path variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.

Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/ directory, keep them simple and avoid adding additional validation code.

Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.

Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.

Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.

🧬 Code Graph Analysis (1)
integration-tests/tests/conftest.py (2)
integration-tests/tests/fixtures/dataset_logs.py (4)
  • elasticsearch (23-29)
  • hive_24hr (13-19)
  • postgresql (43-49)
  • spark_event_logs (33-39)
integration-tests/tests/fixtures/package_config.py (1)
  • package_config (9-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
🔇 Additional comments (1)
integration-tests/tests/conftest.py (1)

1-7: LGTM! Standard pytest fixture aggregation pattern.

The conftest.py correctly aggregates fixtures for use across integration tests.

Bill-hbrhbr and others added 2 commits July 24, 2025 16:24
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
kirkrodrigues and others added 2 commits September 8, 2025 06:44
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/src/dev-docs/building-package.md (1)

11-12: Fix grammar in Requirements section

Replace “a some” and avoid the comma splice.

-  * It should be possible to build a package for a different environment, it just requires a some
-    extra configuration.
+  * It should be possible to build a package for a different environment; it just requires
+    some extra configuration.
♻️ Duplicate comments (1)
docs/src/dev-docs/building-package.md (1)

16-18: Add a short rationale for Task pin and uv lower bound

Pre-empt reader confusion by noting why Task is pinned to 3.44.0 and uv uses a lower bound.

 * [Task] 3.44.0
 * [uv] >= 0.8
+  - Note: Task is pinned to 3.44.0 due to a known issue in 3.44.1; uv uses a lower bound (>= 0.8) to ensure minimum required functionality across platforms.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2073cd and 1fcc997.

📒 Files selected for processing (2)
  • docs/src/dev-docs/building-package.md (2 hunks)
  • taskfile.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfile.yaml
📚 Learning: 2025-08-29T07:31:24.032Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1271
File: components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh:41-43
Timestamp: 2025-08-29T07:31:24.032Z
Learning: In CLP's build tool installation scripts, uv version constraints should use lower bound constraints (uv>=0.8) rather than exact version pinning, following the same philosophy used for other build tools to accommodate platform differences while ensuring minimum required functionality.

Applied to files:

  • docs/src/dev-docs/building-package.md
📚 Learning: 2025-08-25T00:45:05.464Z
Learnt from: junhaoliao
PR: y-scope/clp#1242
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:39-41
Timestamp: 2025-08-25T00:45:05.464Z
Learning: Task v3.44.1 causes issues in the clp-ffi-js project (issue #110), so CLP should avoid upgrading to v3.44.1 and stick with v3.44.0 for now.

Applied to files:

  • docs/src/dev-docs/building-package.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (2)
docs/src/dev-docs/building-package.md (2)

17-17: uv requirement — LGTM

Lower-bound constraint (>= 0.8) matches our tooling policy and avoids over-pinning.


84-84: uv reference link — LGTM

Link is correct and consistent with other docs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
taskfiles/lint.yaml (1)

780-786: Gate mypy and add uv preflight (prior feedback)

  • Consider running mypy only when a config exists to avoid false negatives on untyped projects.
  • Add a quick command -v uv guard so failures are explicit when uv isn’t installed.

These were suggested earlier; re-stating for this block’s final form.

Example gating (inline in this cmd block):

 uv run --directory "{{.ITEM}}" ruff format {{.RUFF_FORMAT_FLAGS}} .
 uv run --directory "{{.ITEM}}" ruff check {{.RUFF_CHECK_FLAGS}} .
-uv run --directory "{{.ITEM}}" mypy .
+if [ -f "{{.ITEM}}/mypy.ini" ] || \
+   ( [ -f "{{.ITEM}}/setup.cfg" ] && grep -q '^\[mypy\]' "{{.ITEM}}/setup.cfg" ) || \
+   ( [ -f "{{.ITEM}}/pyproject.toml" ] && grep -q '^\[tool\.mypy\]' "{{.ITEM}}/pyproject.toml" ); then
+  uv run --directory "{{.ITEM}}" mypy .
+fi

And at the start of the task (one-time check):

+command -v uv >/dev/null 2>&1 || { echo "uv not found on PATH; install uv to run Python linting." >&2; exit 1; }
taskfiles/tests/integration.yaml (6)

21-25: Ensure ::package is resolvable from this Taskfile

If not already included via a parent Taskfile, add an includes: to expose package.

 version: "3"
 
+includes:
+  - path: ../../taskfile.yaml
+    prefix: ""
+
 tasks:

Run to verify resolution:

#!/bin/bash
rg -nP -C2 '^\s*package:\s*$' -g 'Taskfile*.yml' -g 'Taskfile*.yaml' || true

3-7: Scope variables to the task; avoid global leakage and indirection

Make these values task-local and reference the global G_* directly in env.

 vars:
-  CLP_BUILD_DIR: "{{.G_BUILD_DIR}}"
-  CLP_CORE_BINS_DIR: "{{.G_CORE_COMPONENT_BUILD_DIR}}"
-  CLP_PACKAGE_DIR: "{{.G_PACKAGE_BUILD_DIR}}"
-  INTEGRATION_TESTS_DIR: "{{.ROOT_DIR}}/integration-tests"
+  INTEGRATION_TESTS_DIR: "{{.ROOT_DIR}}/integration-tests"
@@
   core:
@@
-    env:
-      CLP_BUILD_DIR: "{{.CLP_BUILD_DIR}}"
-      CLP_CORE_BINS_DIR: "{{.CLP_CORE_BINS_DIR}}"
-      CLP_PACKAGE_DIR: "{{.CLP_PACKAGE_DIR}}"
+    env:
+      CLP_BUILD_DIR: "{{.G_BUILD_DIR}}"
+      CLP_CORE_BINS_DIR: "{{.G_CORE_COMPONENT_BUILD_DIR}}"
+      CLP_PACKAGE_DIR: "{{.G_PACKAGE_BUILD_DIR}}"

Also applies to: 26-30


10-13: Make default an explicit alias to core

Slightly clearer intent than a deps-only task.

   default:
-    deps:
-      - task: "core"
+    cmds:
+      - task: "core"

14-19: Don’t discard pytest collection output

Removing the redirection keeps helpful diagnostics.

   cache-clear:
     dir: "{{.INTEGRATION_TESTS_DIR}}"
     cmds:
       - >-
-        uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet
-        > /dev/null
+        uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet

14-19: Add preconditions to fail fast when uv/pytest are missing (cache-clear)

Improves DX and CI errors.

   cache-clear:
     dir: "{{.INTEGRATION_TESTS_DIR}}"
+    preconditions:
+      - sh: 'command -v uv >/dev/null'
+        msg: 'uv is required; see dev env setup in docs.'
+      - sh: 'uv run python -c "import pytest" >/dev/null 2>&1'
+        msg: 'pytest not available in the uv environment.'
     cmds:
       - >-
         uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet

21-30: Add preconditions for core task (uv/pytest presence)

Prevents opaque failures before running tests.

   core:
     deps:
       # TODO: https://github.com/y-scope/clp/issues/1281
       - task: "::package"
     dir: "{{.INTEGRATION_TESTS_DIR}}"
+    preconditions:
+      - sh: 'command -v uv >/dev/null'
+        msg: 'uv is required; see dev env setup in docs.'
+      - sh: 'uv run python -c "import pytest" >/dev/null 2>&1'
+        msg: 'pytest not available in the uv environment.'
     env:
       CLP_BUILD_DIR: "{{.CLP_BUILD_DIR}}"
       CLP_CORE_BINS_DIR: "{{.CLP_CORE_BINS_DIR}}"
       CLP_PACKAGE_DIR: "{{.CLP_PACKAGE_DIR}}"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fcc997 and fb274d1.

📒 Files selected for processing (2)
  • taskfiles/lint.yaml (3 hunks)
  • taskfiles/tests/integration.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • taskfiles/lint.yaml
  • taskfiles/tests/integration.yaml
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.

Applied to files:

  • taskfiles/lint.yaml
  • taskfiles/tests/integration.yaml
📚 Learning: 2025-08-24T17:10:44.565Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: taskfiles/lint.yaml:780-781
Timestamp: 2025-08-24T17:10:44.565Z
Learning: The CLP repository is gradually migrating from Black to Ruff for Python formatting. Some modules haven't been set up for Ruff yet, so there's intentional mixed usage of Black and Ruff formatters during this transition period. This should not be flagged as an inconsistency in PR reviews.

Applied to files:

  • taskfiles/lint.yaml
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • taskfiles/tests/integration.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfiles/tests/integration.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (3)
taskfiles/lint.yaml (3)

81-83: Confirm desired behaviour: ruff format with only --diff won’t fail CI

ruff format --diff prints diffs but exits 0. If you want check-py to fail on formatting drift for UV projects, add --check (or --exit-code). If non-blocking is intentional, ignore.


89-91: LGTM: fix flags align with auto-fixing

RUFF_CHECK_FLAGS="--fix" with empty format flags will apply changes as intended.


761-766: LGTM: per-project UV loop setup

Declaring UV_PYTHON_PROJECTS and requiring the flag vars matches the per-project linting model for integration-tests.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
integration-tests/pyproject.toml (3)

11-13: Remove console script entry point — tests aren’t meant to be shipped.

Per prior agreement for this PR, we’re not distributing an installable package here. Drop the script to avoid confusion and broken wheels.

-[project.scripts]
-integration-tests = "integration_tests:main"
-

61-61: Fix Ruff isort configuration location — dotted key won’t be read.

Move order-by-type under [tool.ruff.lint.isort]; the current isort.order-by-type key is ignored.

-isort.order-by-type = false
+
+[tool.ruff.lint.isort]
+order-by-type = false

19-24: Alphabetize dev dependency group for consistency.

Keeps diffs clean and matches existing repo guidance.

 dev = [
-    "mypy>=1.16.0",
-    "ruff>=0.11.12",
-    "pytest>=8.4.1",
-    "pytest-env>=1.1.5",
+    "mypy>=1.16.0",
+    "pytest>=8.4.1",
+    "pytest-env>=1.1.5",
+    "ruff>=0.11.12",
 ]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66f61a7 and 47ccb1c.

📒 Files selected for processing (1)
  • integration-tests/pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.

Applied to files:

  • integration-tests/pyproject.toml
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/pyproject.toml
📚 Learning: 2025-08-24T17:10:44.565Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: taskfiles/lint.yaml:780-781
Timestamp: 2025-08-24T17:10:44.565Z
Learning: The CLP repository is gradually migrating from Black to Ruff for Python formatting. Some modules haven't been set up for Ruff yet, so there's intentional mixed usage of Black and Ruff formatters during this transition period. This should not be flagged as an inconsistency in PR reviews.

Applied to files:

  • integration-tests/pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
🔇 Additional comments (1)
integration-tests/pyproject.toml (1)

9-9: Python 3.9 minimum looks correct.

Matches CLP’s stated minimum; ensure CI runs the integration suite on 3.9 to keep it honest.

ignore = [
"ANN401", # Allow using `Any` type for function signatures
"COM812", # Redundant and conflicts with ruff format
"D203", # No blank line before docstrings (D211)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify D203 comment to avoid contradiction with D211.

D203 conflicts with D211; we’re preferring D211. Update the note.

-    "D203",  # No blank line before docstrings (D211)
+    "D203",  # Conflicts with D211; prefer no blank line before class docstring
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"D203", # No blank line before docstrings (D211)
"D203", # Conflicts with D211; prefer no blank line before class docstring
🤖 Prompt for AI Agents
In integration-tests/pyproject.toml around line 43, the inline comment for
"D203" incorrectly states "No blank line before docstrings (D211)" causing a
contradiction with D211; update the comment to clearly state that D203 conflicts
with D211 and that the project prefers D211 (e.g., "D203 conflicts with D211; we
prefer D211 — ensure blank line before multi-line docstrings") or remove/replace
the D203 note so it unambiguously reflects the prefered rule.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to update the PR title.

Comment on lines +75 to +77
shutil.unpack_archive(
integration_test_logs.tarball_path, integration_test_logs.extraction_dir
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also chmod -R gu+w the files? Since the hive-24hr dataset is read-only by default, task clean will always fail after running the integration tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Let me know if you need to open an issue for changing the implementation to use the os library.

@Bill-hbrhbr Bill-hbrhbr changed the title feat(integration-tests): Add CLP package integration tests boilerplate. feat(integration-tests): Add infrastructure for running integration tests; Add basic integration tests for clp & clp-s compression. Sep 14, 2025
Bill-hbrhbr and others added 2 commits September 15, 2025 00:03
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
taskfiles/tests/integration.yaml (5)

11-16: Don’t suppress pytest collection output; it hides useful failures

Drop the stdout redirection to /dev/null.

-      - >-
-        uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet
-        > /dev/null
+      - >-
+        uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet

7-9: Prefer explicit alias over deps-only default

Make default a thin alias to core for clarity.

   default:
-    deps:
-      - task: "core"
+    cmds:
+      - task: "core"

27-27: Use cmds list for consistency

Align with cache-clear’s cmds array.

-    cmd: "uv run python -m pytest -m core"
+    cmds:
+      - uv run python -m pytest -m core

18-27: Add preconditions to fail fast when uv/pytest are missing (core)

Mirror cache-clear checks here.

   core:
     deps:
       # TODO: https://github.com/y-scope/clp/issues/1281
       - task: "::package"
     dir: "{{.G_INTEGRATION_TESTS_DIR}}"
+    preconditions:
+      - sh: 'command -v uv >/dev/null'
+        msg: 'uv is required; see dev env setup in docs.'
+      - sh: 'uv run python -c "import pytest" >/dev/null 2>&1'
+        msg: 'pytest not available in the uv environment.'
     env:

11-16: Add preconditions to fail fast when uv/pytest are missing (cache-clear)

Early checks improve CI/dev ergonomics.

   cache-clear:
     dir: "{{.G_INTEGRATION_TESTS_DIR}}"
+    preconditions:
+      - sh: 'command -v uv >/dev/null'
+        msg: 'uv is required; see dev env setup in docs.'
+      - sh: 'uv run python -c "import pytest" >/dev/null 2>&1'
+        msg: 'pytest not available in the uv environment.'
     cmds:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47ccb1c and ba35f88.

📒 Files selected for processing (1)
  • taskfiles/tests/integration.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.

Applied to files:

  • taskfiles/tests/integration.yaml
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • taskfiles/tests/integration.yaml
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.

Applied to files:

  • taskfiles/tests/integration.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfiles/tests/integration.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: package-image
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (2)
taskfiles/tests/integration.yaml (2)

3-5: Good: global var follows naming convention and centralises path

G_INTEGRATION_TESTS_DIR is appropriately G_-prefixed and reused by tasks.


23-26: Good: task-scoped env over inline exports

Scoped CLP_* env keeps globals clean and matches house style.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
taskfile.yaml (1)

11-11: Include wiring: verify that tests:integration:cache-clear is actually exposed under the tests namespace

The include path and namespacing look right, but please confirm that:

  • taskfiles/tests/main.yaml exists and has version: "3";
  • it (or its transitive includes) defines a task named integration:cache-clear so the root can call tests:integration:cache-clear.

Run this to verify:

#!/bin/bash
set -euo pipefail

# 1) File exists and is Task v3
test -f taskfiles/tests/main.yaml || { echo "Missing: taskfiles/tests/main.yaml"; exit 1; }
rg -nP '^\s*version:\s*"?3"?\s*$' taskfiles/tests/main.yaml || { echo "taskfiles/tests/main.yaml: version != 3"; exit 1; }

# 2) Check that some included file defines integration:cache-clear
rg -nP '^\s{2,}integration:cache-clear\s*:' taskfiles/tests || {
  echo "No task named 'integration:cache-clear' found under taskfiles/tests/*"; exit 1;
}

# 3) If go-task is available, ensure the namespaced task is discoverable
if command -v task >/dev/null 2>&1; then
  task -a | rg -n '^\s*tests:integration:cache-clear\b' || {
    echo "'tests:integration:cache-clear' not discoverable by go-task"; exit 1;
  }
else
  echo "Note: go-task not installed; skipped discovery check."
fi
integration-tests/tests/fixtures/integration_test_logs.py (1)

79-81: Best-effort cleanup of partial artefacts on failure

Re-raising is good; also delete any partial tarball to avoid corrupt leftovers.

@@
-    except Exception as e:
-        err_msg = f"Failed to download and extract dataset `{name}`."
-        raise RuntimeError(err_msg) from e
+    except Exception as e:
+        # Best-effort cleanup of partial artefacts
+        try:
+            if integration_test_logs.tarball_path.exists():
+                integration_test_logs.tarball_path.unlink()
+        except Exception:
+            pass
+        err_msg = f"Failed to download and extract dataset `{name}`."
+        raise RuntimeError(err_msg) from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba35f88 and 8c98e4d.

📒 Files selected for processing (2)
  • integration-tests/tests/fixtures/integration_test_logs.py (1 hunks)
  • taskfile.yaml (3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
  • taskfile.yaml
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
PR: y-scope/clp#1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-20T09:01:25.825Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:60-71
Timestamp: 2025-08-20T09:01:25.825Z
Learning: For integration tests using curl downloads in PR #1100, curl's built-in timeout and retry mechanisms (with --fail and --show-error flags) are sufficient, and additional subprocess timeout parameters are not needed.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.

Applied to files:

  • taskfile.yaml
🧬 Code graph analysis (1)
integration-tests/tests/fixtures/integration_test_logs.py (2)
integration-tests/tests/utils/config.py (2)
  • IntegrationTestConfig (74-95)
  • IntegrationTestLogs (99-123)
integration-tests/tests/utils/utils.py (1)
  • unlink (54-75)
🪛 GitHub Actions: clp-lint
integration-tests/tests/fixtures/integration_test_logs.py

[error] 75-75: lint:check-no-cpp failed: Ruff format would reformat 1 file (tests/fixtures/integration_test_logs.py). Run 'ruff format' to apply formatting changes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
🔇 Additional comments (4)
taskfile.yaml (1)

610-610: No action needed.

integration-tests/tests/fixtures/integration_test_logs.py (3)

18-29: Good fixture: hive_24hr

Session-scoped, typed, and delegates to the shared helper cleanly.


32-43: Good fixture: postgresql

Mirrors hive_24hr; consistent and reusable.


46-60: Helper structure reads well

Instantiation + early return via pytest cache is straightforward; logging is clear.

Comment on lines 74 to 78
unlink(integration_test_logs.extraction_dir)
shutil.unpack_archive(
integration_test_logs.tarball_path, integration_test_logs.extraction_dir
)
subprocess.run(["chmod","-R", "gu+w", integration_test_logs.extraction_dir], check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Make chmod cross‑platform (or guard by OS)

Directly invoking chmod will fail on Windows runners. Either gate by OS, or switch to a Python walk that sets write bits. Minimal guard shown below.

@@
-        shutil.unpack_archive(
-            integration_test_logs.tarball_path, integration_test_logs.extraction_dir
-        )
-        subprocess.run(["chmod","-R", "gu+w", integration_test_logs.extraction_dir], check=True)
+        shutil.unpack_archive(
+            integration_test_logs.tarball_path, integration_test_logs.extraction_dir
+        )
+        # Avoid POSIX-only chmod on Windows
+        if os.name != "nt":
+            subprocess.run(
+                ["chmod", "-R", "gu+w", str(integration_test_logs.extraction_dir)],
+                check=True,
+            )

Add import (outside this hunk):

import os

🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="integration-tests/tests/fixtures/integration_test_logs.py"

echo "== existence =="
if [ ! -f "$file" ]; then
  echo "MISSING: $file"
  exit 0
fi

echo "== file (with line numbers) =="
nl -ba "$file" | sed -n '1,240p'

echo "== grep for relevant tokens =="
rg -n --hidden --line-number "chmod" || true
rg -n --hidden --line-number "unpack_archive" || true
rg -n --hidden --line-number "subprocess.run" || true
rg -n --hidden --line-number "import os" || true

Length of output: 88


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="integration-tests/tests/fixtures/integration_test_logs.py"

echo "== existence =="
if [ ! -f "$file" ]; then
  echo "MISSING: $file"
  exit 0
fi

echo "== file (with line numbers 1..240) =="
awk 'NR<=240 {printf "%4d: %s\n", NR, $0}' "$file"

echo "== search for relevant tokens =="
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden --line-number "chmod|unpack_archive|subprocess.run|import os" "$file" || true
else
  grep -n -E "chmod|unpack_archive|subprocess.run|import os" "$file" || true
fi

Length of output: 3420


Guard POSIX chmod (avoid calling chmod on Windows)

subprocess.run(["chmod","-R","gu+w", ...]) at integration-tests/tests/fixtures/integration_test_logs.py:78 is POSIX-only and will fail on Windows CI; gate the call or set write bits with a Python os.walk.

@@
-        shutil.unpack_archive(
-            integration_test_logs.tarball_path, integration_test_logs.extraction_dir
-        )
-        subprocess.run(["chmod","-R", "gu+w", integration_test_logs.extraction_dir], check=True)
+        shutil.unpack_archive(
+            integration_test_logs.tarball_path, integration_test_logs.extraction_dir
+        )
+        # Avoid POSIX-only chmod on Windows
+        if os.name != "nt":
+            subprocess.run(
+                ["chmod", "-R", "gu+w", str(integration_test_logs.extraction_dir)],
+                check=True,
+            )

Add import (outside this hunk):

import os
🧰 Tools
🪛 GitHub Actions: clp-lint

[error] 75-75: lint:check-no-cpp failed: Ruff format would reformat 1 file (tests/fixtures/integration_test_logs.py). Run 'ruff format' to apply formatting changes.

vars:
COMPONENT: "job-orchestration"
- task: "clean-webui"
- task: "tests:integration:cache-clear"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Avoid coupling clean to Python/pytest; move cache-clear to an opt-in task

Making clean invoke tests infra can fail on machines without uv/pytest and slows a common housekeeping command. Suggest splitting to a separate aggregate task and keeping clean lightweight.

Apply:

   clean:
     cmds:
       - "rm -rf '{{.G_BUILD_DIR}}'"
       - task: "clean-python-component"
         vars:
           COMPONENT: "clp-package-utils"
       - task: "clean-python-component"
         vars:
           COMPONENT: "clp-py-utils"
       - task: "clean-python-component"
         vars:
           COMPONENT: "job-orchestration"
       - task: "clean-webui"
-      - task: "tests:integration:cache-clear"
+
+  clean-all:
+    desc: "Clean build artifacts and integration-test caches"
+    cmds:
+      - task: "clean"
+      - task: "tests:integration:cache-clear"

If you must keep it under clean, at least guard it as a stopgap:

-      - task: "tests:integration:cache-clear"
+      - task: "tests:integration:cache-clear"
+        ignore_error: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- task: "tests:integration:cache-clear"
clean:
cmds:
- "rm -rf '{{.G_BUILD_DIR}}'"
- task: "clean-python-component"
vars:
COMPONENT: "clp-package-utils"
- task: "clean-python-component"
vars:
COMPONENT: "clp-py-utils"
- task: "clean-python-component"
vars:
COMPONENT: "job-orchestration"
- task: "clean-webui"
clean-all:
desc: "Clean build artifacts and integration-test caches"
cmds:
- task: "clean"
- task: "tests:integration:cache-clear"
🤖 Prompt for AI Agents
In taskfile.yaml around line 67, the clean task currently runs the tests
integration cache-clear task which couples clean to Python/pytest and slows a
common housekeeping command; remove "tests:integration:cache-clear" from the
clean aggregate and create a new opt-in aggregate task (e.g.,
"clean:with-cache-clear" or "maintenance:cache-clear") that includes the
tests:integration:cache-clear entry, or if you must keep it under clean
temporarily, wrap the cache-clear invocation behind an explicit flag/environment
check so it only runs when requested (e.g., a parameter or env var) and update
task descriptions accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
integration-tests/tests/fixtures/integration_test_logs.py (2)

82-88: Make permission fix cross‑platform; avoid hard‑failing when chmod is absent

Don’t error out on Windows or environments without chmod. Gate by OS and provide a Python fallback to set write bits.

Apply this diff:

@@
-    # Allow the extracted content to be deletable or overwritable
-    chmod_bin = shutil.which("chmod")
-    if chmod_bin is None:
-        err_msg = "chmod executable not found"
-        raise RuntimeError(err_msg)
-    subprocess.run([chmod_bin, "-R", "gu+w", integration_test_logs.extraction_dir], check=True)
+    # Allow the extracted content to be deletable or overwritable
+    chmod_bin = shutil.which("chmod")
+    if os.name != "nt" and chmod_bin is not None:
+        subprocess.run(
+            [chmod_bin, "-R", "u+w,g+w", str(integration_test_logs.extraction_dir)],
+            check=True,
+        )
+    else:
+        # Python fallback: ensure write bit for user (and group where supported)
+        for p in [integration_test_logs.extraction_dir, *integration_test_logs.extraction_dir.rglob("*")]:
+            try:
+                mode = os.stat(p).st_mode
+                os.chmod(p, mode | stat.S_IWUSR | getattr(stat, "S_IWGRP", 0))
+            except Exception:
+                pass

Add imports at the top of the file:

import os
import stat

78-80: Clean up partial artefacts on failure

Best‑effort remove a partially downloaded tarball and any partially extracted directory before re‑raising.

@@
-    except Exception as e:
-        err_msg = f"Failed to download and extract dataset `{name}`."
-        raise RuntimeError(err_msg) from e
+    except Exception as e:
+        # Best-effort cleanup of partial artefacts
+        try:
+            if integration_test_logs.tarball_path.exists():
+                integration_test_logs.tarball_path.unlink()
+            unlink(integration_test_logs.extraction_dir)
+        except Exception:
+            pass
+        err_msg = f"Failed to download and extract dataset `{name}`."
+        raise RuntimeError(err_msg) from e
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60501f1 and 31e69e3.

📒 Files selected for processing (1)
  • integration-tests/tests/fixtures/integration_test_logs.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
PR: y-scope/clp#1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-20T09:01:25.825Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:60-71
Timestamp: 2025-08-20T09:01:25.825Z
Learning: For integration tests using curl downloads in PR #1100, curl's built-in timeout and retry mechanisms (with --fail and --show-error flags) are sufficient, and additional subprocess timeout parameters are not needed.

Applied to files:

  • integration-tests/tests/fixtures/integration_test_logs.py
🧬 Code graph analysis (1)
integration-tests/tests/fixtures/integration_test_logs.py (3)
integration-tests/tests/utils/config.py (2)
  • IntegrationTestConfig (74-95)
  • IntegrationTestLogs (99-123)
integration-tests/tests/utils/utils.py (1)
  • unlink (54-75)
integration-tests/tests/fixtures/integration_test_config.py (1)
  • integration_test_config (16-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: package-image
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-static-linked-bins
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
integration-tests/tests/fixtures/integration_test_logs.py (1)

18-30: Fixtures look good

Scopes, typing, and naming are consistent. No issues spotted here.

Also applies to: 32-44

Comment on lines +83 to +86
chmod_bin = shutil.which("chmod")
if chmod_bin is None:
err_msg = "chmod executable not found"
raise RuntimeError(err_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this for chmod but not for curl? Won't subprocess.run fail anyway if the command can't be found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess the linter didn't complain because I turned off the formatting for the curl block.

@Bill-hbrhbr Bill-hbrhbr merged commit 9aee596 into y-scope:main Sep 15, 2025
25 checks passed
@Bill-hbrhbr Bill-hbrhbr deleted the integration-tests-boilerplate branch September 15, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants