Skip to content

Developed an end-to-end catalog ingestion flow from Petpooja(POS) to Meta #1932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

himanshugt16
Copy link
Contributor

@himanshugt16 himanshugt16 commented Apr 24, 2025

Summary by CodeRabbit

  • New Features

    • Introduced catalog synchronization and POS integration capabilities, including support for Petpooja as a provider.
    • Added endpoints for catalog sync, POS integration configuration, and rerunning syncs.
    • Implemented catalog sync status notifications via customizable email templates.
    • Enabled AI and Meta (Facebook) catalog sync options with detailed logging and status tracking for sync events.
    • Added bot-level sync configuration management and integration token-based endpoint generation.
    • Added batch processing for catalog data upserts and embedding synchronization.
    • Introduced MetaProcessor for managing product catalog synchronization with Meta's API.
  • Bug Fixes

    • Enhanced validation for catalog sync payloads, including item toggle requests and image configurations.
    • Improved error handling for missing or invalid catalog sync data and sync type validations.
  • Tests

    • Added comprehensive integration and unit tests covering catalog sync workflows, POS integration, data preprocessing, image handling, and error scenarios.
    • Included tests for sync rerun functionality and sync limit enforcement.
  • Documentation

    • Added new email template for catalog sync status updates.
  • Chores

    • Added configuration files and mappings to support provider-specific catalog data extraction and synchronization.
    • Refactored sync type terminology and updated related constants and models.

Copy link

coderabbitai bot commented Apr 24, 2025

Walkthrough

This update introduces a comprehensive catalog synchronization and POS integration framework. It adds new API endpoints for catalog sync and POS integration configuration, implements provider-specific sync logic (with Petpooja as an initial provider), and integrates catalog data with both knowledge vault (AI) and Meta (Facebook) catalog systems. The change includes new data models, logging mechanisms, provider mapping configurations, and email notification templates for sync status. Extensive validation, error handling, and batch processing are implemented for catalog data operations. The update is accompanied by robust unit and integration test coverage for all major workflows.

Changes

File(s) / Group Change Summary
kairon/api/app/main.py, kairon/api/app/routers/bot/integrations.py Added new router for catalog sync integrations; registered /api/bot/integration endpoints for catalog sync and rerun sync operations.
kairon/api/app/routers/bot/data.py Renamed event_type to sync_type in knowledge vault sync; added POST endpoint for POS integration config; updated imports.
kairon/catalog_sync/definitions/base.py, kairon/catalog_sync/definitions/factory.py Introduced abstract base and factory classes for catalog sync providers.
kairon/events/definitions/catalog_sync.py, kairon/events/definitions/petpooja_sync.py Added new event and provider-specific classes for catalog sync, including validation, preprocessing, execution, and email notification logic.
kairon/events/definitions/factory.py Registered new CatalogSync event class in event factory.
kairon/meta/processor.py Added MetaProcessor class for Meta catalog API integration, supporting push, update, and delete operations.
kairon/shared/account/data_objects.py, kairon/shared/account/processor.py Added new email template field and logic for catalog sync status notifications.
kairon/shared/catalog_sync/catalog_sync_log_processor.py, kairon/shared/catalog_sync/data_objects.py Introduced logging and mapping models/processors for catalog sync events and provider mappings.
kairon/shared/cognition/processor.py Refactored event type to sync type; implemented batch upsert, provider mapping loading, POS integration config, and catalog sync logic.
kairon/shared/constants.py, kairon/shared/data/constant.py Added new enums for catalog sync event, provider, sync status, and sync types.
kairon/shared/data/data_models.py, kairon/shared/data/data_objects.py Added Pydantic and MongoEngine models for POS integrations, bot sync config, and catalog sync logs.
kairon/shared/data/utils.py Added method to generate integration endpoint URLs for catalog sync.
kairon/shared/llm/processor.py Added async method for deleting collection points in Qdrant.
kairon/shared/models.py Renamed VaultSyncEventType to VaultSyncType and member field_update to item_toggle.
kairon/shared/utils.py Added email handler for catalog sync status notifications.
metadata/catalog_provider_mappings.json Added provider mapping config for Petpooja catalog fields.
template/emails/catalog_sync_status.html Added new HTML email template for catalog sync status updates.
tests/integration_test/services_test.py Added extensive integration tests for POS integration and catalog sync workflows, with various scenarios and payloads.
tests/testing_data/catalog_sync/… Added multiple JSON payloads for catalog sync test cases (push menu, item toggle, invalid scenarios, etc.).
tests/unit_test/data_processor/catalog_sync_log_processor_test.py, tests/unit_test/data_processor/data_processor_test.py Added comprehensive unit tests for catalog sync log processor and cognition data processor, covering validation, preprocessing, image resolution, and sync config logic.
tests/unit_test/action/action_test.py Added catalog_sync_limit_per_day setting to bot settings test data.

Sequence Diagram(s)

sequenceDiagram
    participant Provider as Catalog Provider (e.g., Petpooja)
    participant API as Kairon API
    participant Event as CatalogSync Event
    participant Log as CatalogSyncLogProcessor
    participant AI as CognitionDataProcessor
    participant Meta as MetaProcessor
    participant Mail as MailUtility

    Provider->>API: POST /api/bot/integration/{provider}/{sync_type}/{bot}/{token}
    API->>Event: Validate payload and sync status
    Event->>Log: Log sync initiation
    Event->>Event: Preprocess data (provider-specific)
    Event->>AI: Validate and upsert data (if AI enabled)
    Event->>Meta: Sync with Meta catalog (if meta enabled)
    Event->>Log: Log sync completion/failure
    Event->>Mail: Send sync status email
    API-->>Provider: Respond with sync in progress message
Loading

Possibly related PRs

Suggested reviewers

  • sushantpatade

Poem

In the warren where data flows and hops,
A catalog sync leaps—no stops!
With Petpooja mapped and Meta in sight,
Logs and emails keep things right.
Tests abound, endpoints bloom,
Rabbits cheer in every room!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 28

🔭 Outside diff range comments (2)
kairon/shared/data/data_objects.py (1)

1106-1120: 🛠️ Refactor suggestion

BotSyncConfig lacks validation ensuring at least one of AI or Meta sync is enabled.

An empty config will silently pass and block downstream checks. Consider:

def validate(self, clean=True):
    if clean:
        self.clean()
    if not self.ai_enabled and not self.meta_enabled:
        raise ValidationError("Either ai_enabled or meta_enabled must be True")
kairon/shared/cognition/processor.py (1)

523-539: ⚠️ Potential issue

Fix column-validation for item_toggle – current logic rejects perfectly valid payloads

When sync_type == VaultSyncType.item_toggle.name we should allow any subset of columns besides the primary key and we must not fail on column-length mismatch.
However, the current code still:

  1. Performs the column_length_mismatch check (lines 524-529).
  2. Uses set(row.keys()).issubset(column_dict.keys()), which does not explicitly guarantee the presence of the primary key.

As a result, legitimate partial-update rows will be reported as invalid.

@@
-            if "column_length_mismatch" in event_validations:
-                if len(row.keys()) != len(column_dict.keys()):
-                    row_errors.append({
-                        "status": "Column length mismatch",
-                        "expected_columns": list(column_dict.keys()),
-                        "actual_columns": list(row.keys())
-                    })
+            # In item-toggle we expect a sparse update – skip strict length check
+            if "column_length_mismatch" in event_validations and sync_type != VaultSyncType.item_toggle.name:
+                if len(row.keys()) != len(column_dict.keys()):
+                    row_errors.append({
+                        "status": "Column length mismatch",
+                        "expected_columns": list(column_dict.keys()),
+                        "actual_columns": list(row.keys())
+                    })
@@
-                if not set(row.keys()).issubset(set(column_dict.keys())):
+                valid_columns = {primary_key_col} | set(column_dict.keys())
+                if not set(row.keys()).issubset(valid_columns) or primary_key_col not in row:
                     row_errors.append({
                         "status": "Invalid columns in input data",
-                        "expected_columns": expected_columns,
+                        "expected_columns": expected_columns,
                         "actual_columns": list(row.keys())
                     })
🧰 Tools
🪛 Ruff (0.8.2)

523-524: Use a single if statement instead of nested if statements

(SIM102)


533-533: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

♻️ Duplicate comments (1)
kairon/shared/catalog_sync/catalog_sync_log_processor.py (1)

76-85: 🛠️ Refactor suggestion

Same MultipleObjectsReturned risk in is_sync_in_progress

Replicate the defensive pattern suggested for add_log; otherwise the “already in progress” guard itself can explode.

🧹 Nitpick comments (31)
kairon/shared/utils.py (1)

2670-2683: Email handler method for catalog sync status notifications correctly implemented.

The __handle_catalog_sync_status method follows the same pattern as other email handlers in the MailUtility class, replacing template placeholders with actual values from the kwargs.

Consider adding type hints to the method parameters for consistency with other methods in the class:

-    def __handle_catalog_sync_status(**kwargs):
+    def __handle_catalog_sync_status(**kwargs) -> tuple[str, str]:
kairon/catalog_sync/definitions/base.py (1)

4-18: Abstract base class provides a clear interface for implementing catalog sync providers.

The class establishes a standard contract with three required methods: validate, preprocess, and execute. This ensures consistency across different catalog sync implementations.

Consider making the error message more specific to indicate which method is not implemented:

-    def validate(self):
-        raise NotImplementedError("Provider not implemented")
+    def validate(self):
+        raise NotImplementedError("validate() method not implemented for this provider")

-    def preprocess(self):
-        raise NotImplementedError("Provider not implemented")
+    def preprocess(self):
+        raise NotImplementedError("preprocess() method not implemented for this provider")

-    def execute(self, **kwargs):
-        raise NotImplementedError("Provider not implemented")
+    def execute(self, **kwargs):
+        raise NotImplementedError("execute() method not implemented for this provider")
kairon/catalog_sync/definitions/factory.py (1)

6-29: Clean factory implementation with proper error handling.

The factory provides a clean way to instantiate the correct catalog sync implementation based on provider name.

Improve exception chaining for better debugging by using from syntax:

try:
    provider_enum = CatalogSyncClass(provider.lower())
except ValueError as err:
    valid_syncs = [sync.value for sync in CatalogSyncClass]
-   raise AppException(f"'{provider}' is not a valid catalog sync provider. Accepted types: {valid_syncs}")
+   raise AppException(f"'{provider}' is not a valid catalog sync provider. Accepted types: {valid_syncs}") from err
🧰 Tools
🪛 Ruff (0.8.2)

23-23: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

kairon/shared/data/constant.py (1)

114-130: Consider relationship with existing EVENT_STATUS enum

The new SYNC_STATUS enum has several values that overlap with the existing EVENT_STATUS enum (lines 93-111), such as "INITIATED", "VALIDATING", "SAVE", "COMPLETED", "FAILED", and "ABORTED". Consider whether these could be consolidated or if there's a specific reason to maintain them as separate enumerations.

If there's a strong reason to keep them separate, consider adding a class docstring to both enums explaining their distinct purposes. If they could be consolidated, consider refactoring to use a single shared event status enum with additional catalog-specific statuses as needed.

metadata/catalog_provider_mappings.json (2)

25-31: Consider making default URL and brand configurable per implementation

The default URL ("https://www.kairon.com/") and brand ("Sattva") values are hardcoded. Consider making these configurable per implementation or bot instance, as they may not be appropriate for all users of the system.

This could be implemented by adding bot-specific configuration options in the database or allowing these values to be specified during integration setup.


50-53: Make facebook_product_category configurable based on business type

The default value for facebook_product_category is set to "Food and drink > General", which is appropriate for restaurants but might not suit other business types. Consider making this configurable based on the business category.

You could implement a mapping of business types to appropriate Facebook product categories, or allow this to be specified during integration setup.

kairon/shared/catalog_sync/data_objects.py (1)

3-4: Remove unused imports

The imports BooleanField and ListField are declared but not used in the file.

-from mongoengine import StringField, BooleanField, DateTimeField, DynamicDocument, DictField, ListField
+from mongoengine import StringField, DateTimeField, DynamicDocument, DictField
🧰 Tools
🪛 Ruff (0.8.2)

3-3: mongoengine.BooleanField imported but unused

Remove unused import

(F401)


3-3: mongoengine.ListField imported but unused

Remove unused import

(F401)

kairon/api/app/routers/bot/data.py (3)

2-2: Remove unused import.

The Text import from typing is not being used in this file.

-from typing import List, Text
+from typing import List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Text imported but unused

Remove unused import: typing.Text

(F401)


4-4: Remove unused import.

The Path import from fastapi is not being used in this file.

-from fastapi import UploadFile, File, Security, APIRouter, Query, HTTPException, Path
+from fastapi import UploadFile, File, Security, APIRouter, Query, HTTPException
🧰 Tools
🪛 Ruff (0.8.2)

4-4: fastapi.Path imported but unused

Remove unused import: fastapi.Path

(F401)


374-395: Well-implemented POS integration endpoint.

This new endpoint for adding POS integration configurations properly validates the provider against the allowed catalog sync classes, loads provider mappings, and returns the integration endpoint URL.

A few minor improvements to consider:

  1. Add more descriptive docstring beyond "Add data integration config"
  2. Avoid using Security function in argument defaults
@router.post("/integrations/add", response_model=Response)
async def add_pos_integration_config(
    request_data: POSIntegrationRequest,
    sync_type: str,
-   current_user: User = Security(Authentication.get_current_user_and_bot, scopes=DESIGNER_ACCESS),
+   current_user: User = Depends(lambda: Authentication.get_current_user_and_bot(scopes=DESIGNER_ACCESS)),
):
    """
-   Add data integration config
+   Add Point of Sale (POS) integration configuration.
+   
+   This endpoint validates the provider, adds bot sync configuration,
+   and saves the POS integration configuration.
+   
+   Parameters:
+       request_data: POS integration request data
+       sync_type: Type of synchronization
+       current_user: Authenticated user with bot access
+   
+   Returns:
+       Integration endpoint URL
    """
🧰 Tools
🪛 Ruff (0.8.2)

379-379: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

template/emails/catalog_sync_status.html (1)

115-115: Update preheader text for catalog sync emails.

The preheader text (hidden text shown in email previews) is not specific to catalog sync status emails and appears to be a generic welcome message.

-        We're thrilled to have you on kAIron! Get ready to dive into your first bot.
+        Catalog synchronization status update for your kAIron bot.
kairon/api/app/routers/bot/integrations.py (1)

10-15: Remove unused imports & variables to silence Ruff F401 and cut dead code

MailUtility (l. 15) and the cognition_processor instance (l. 18) are never referenced in this module.
Please delete them (or use them) to avoid lint failures and keep the router lean.

-from kairon.shared.utils import MailUtility
-
-cognition_processor = CognitionDataProcessor()

Also applies to: 17-18

🧰 Tools
🪛 Ruff (0.8.2)

15-15: kairon.shared.utils.MailUtility imported but unused

Remove unused import: kairon.shared.utils.MailUtility

(F401)

tests/unit_test/data_processor/catalog_sync_log_processor_test.py (2)

7-15: Prune unused imports to keep the test suite tidy

uni_escape_re, AppException, and CognitionDataProcessor are never used and trigger Ruff F401.

-from babel.messages.jslexer import uni_escape_re
-from kairon.exceptions import AppException
-from kairon.shared.cognition.processor import CognitionDataProcessor
🧰 Tools
🪛 Ruff (0.8.2)

7-7: babel.messages.jslexer.uni_escape_re imported but unused

Remove unused import: babel.messages.jslexer.uni_escape_re

(F401)


11-11: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


15-15: kairon.shared.cognition.processor.CognitionDataProcessor imported but unused

Remove unused import: kairon.shared.cognition.processor.CognitionDataProcessor

(F401)


24-27: Environment variable should be upper-snake-case

Ruff SIM112 warns on system_file; rename to SYSTEM_FILE to follow convention and avoid conflicts.

-os.environ["system_file"] = "./tests/testing_data/system.yaml"
+os.environ["SYSTEM_FILE"] = "./tests/testing_data/system.yaml"
🧰 Tools
🪛 Ruff (0.8.2)

24-24: Use capitalized environment variable SYSTEM_FILE instead of system_file

Replace system_file with SYSTEM_FILE

(SIM112)

kairon/meta/processor.py (3)

4-10: Remove unused imports to keep the module clean and avoid Ruff F401.

urljoin, Utility, and AioRestClient are never referenced.

-from urllib.parse import urljoin
-from kairon import Utility
-from kairon.shared.rest_client import AioRestClient
🧰 Tools
🪛 Ruff (0.8.2)

4-4: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


7-7: kairon.Utility imported but unused

Remove unused import: kairon.Utility

(F401)


9-9: kairon.shared.rest_client.AioRestClient imported but unused

Remove unused import: kairon.shared.rest_client.AioRestClient

(F401)


59-60: Shadowing built-in id in list-comprehension.

Not breaking, but avoid overriding built-ins for readability:

-return [{"retailer_id": id, "method": "DELETE"} for id in remaining_ids]
+return [{"retailer_id": rid, "method": "DELETE"} for rid in remaining_ids]

115-119: Replace print with structured logging.

loguru is already imported; use it for consistency and traceability.

-print("Response JSON:", response.json())
-print("Successfully deleted data from meta.")
+logger.info("Meta delete response: {}", response.json())
kairon/shared/data/data_objects.py (1)

931-935: catalog_sync_limit_per_day added – update validation or docs to reflect the new quota.

There is no validation ensuring the value is positive/non-zero; consider:

if self.catalog_sync_limit_per_day <= 0:
    raise ValidationError("catalog_sync_limit_per_day must be positive")
kairon/events/definitions/petpooja_sync.py (3)

1-14: Prune unused imports (Ruff F401).

set_key, Utility, EventClass, BotSyncConfig, and AppException aren’t used.

-from dotenv import set_key
-from kairon import Utility
-from kairon.exceptions import AppException
-from kairon.shared.constants import EventClass
-from kairon.shared.data.data_objects import BotSyncConfig
🧰 Tools
🪛 Ruff (0.8.2)

3-3: dotenv.set_key imported but unused

Remove unused import: dotenv.set_key

(F401)


5-5: kairon.Utility imported but unused

Remove unused import: kairon.Utility

(F401)


6-6: loguru.logger imported but unused

Remove unused import: loguru.logger

(F401)


9-9: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


12-12: kairon.shared.constants.EventClass imported but unused

Remove unused import: kairon.shared.constants.EventClass

(F401)


14-14: kairon.shared.data.data_objects.BotSyncConfig imported but unused

Remove unused import: kairon.shared.data.data_objects.BotSyncConfig

(F401)


166-173: Blocking HTTP calls inside the event loop – wrap MetaProcessor methods with await but ensure they are non-blocking.

MetaProcessor.*_meta_catalog uses asyncio.to_thread, which is fine, but if the processor itself grows, consider making these calls through an aiohttp client instead of threads.


175-181: Use logger instead of print for success notifications.

Keeps log aggregation consistent.

-print(str(e))
+logger.exception(str(e))
tests/integration_test/services_test.py (3)

1-4: Remove unused std-lib imports
os, re, and shutil are imported but never referenced in this module. Unused imports slow down import time and add cognitive overhead.

-import asyncio
-import os
-import re
-import shutil
+import asyncio

3808-3811: token assigned but never used – clean up or use
Multiple tests store token = sync_url.split(str(pytest.bot) + "/")[1] but never reference it afterwards, triggering Ruff F841.

Either remove the assignment:

-    token = sync_url.split(str(pytest.bot) + "/")[1]

or use token in subsequent assertions to validate the rerun URL construction.

Also applies to: 3936-3938, 4570-4573

🧰 Tools
🪛 Ruff (0.8.2)

3810-3810: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)


3157-5039: High duplication – consider parameterised fixtures
Nearly every catalog-sync test block repeats the same setup:

  • Saving the same LLMSecret objects
  • Creating identical Petpooja payloads
  • Posting /integrations/add and extracting sync_url
  • Mocking the same methods

This inflates the test file (~5 k lines) and slows execution. Using

@pytest.fixture
def petpooja_setup(...):
    ...
    yield sync_url

and @pytest.mark.parametrize over (sync_type, ai_enabled, meta_enabled, delete_data, images_mode, expect_failure, …) would reduce code duplication dramatically and make maintenance easier.

🧰 Tools
🪛 Ruff (0.8.2)

3810-3810: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)


3937-3937: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)


4572-4572: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)

kairon/shared/catalog_sync/catalog_sync_log_processor.py (2)

1-17: Remove unused imports to satisfy Ruff and avoid dead code

json, List, ColumnMetadata, ContentValidationLogs, and MongoProcessor are brought in but never used. Ruff (F401) already flags these; leaving them will break CI or at least keep warnings around.

-import json
-from typing import List
-...
-from kairon.shared.cognition.data_objects import CognitionSchema, ColumnMetadata, CollectionData
+from kairon.shared.cognition.data_objects import CognitionSchema, CollectionData
-from kairon.shared.content_importer.data_objects import ContentValidationLogs
-...
-from kairon.shared.data.processor import MongoProcessor

Cleaning them out keeps the module lean and the import graph accurate.

🧰 Tools
🪛 Ruff (0.8.2)

1-1: json imported but unused

Remove unused import: json

(F401)


3-3: typing.List imported but unused

Remove unused import: typing.List

(F401)


9-9: kairon.shared.cognition.data_objects.ColumnMetadata imported but unused

Remove unused import: kairon.shared.cognition.data_objects.ColumnMetadata

(F401)


11-11: kairon.shared.content_importer.data_objects.ContentValidationLogs imported but unused

Remove unused import: kairon.shared.content_importer.data_objects.ContentValidationLogs

(F401)


15-15: kairon.shared.data.processor.MongoProcessor imported but unused

Remove unused import: kairon.shared.data.processor.MongoProcessor

(F401)


309-311: Remove extraneous f prefix

Line 310: f"Global fallback image URL not found" has no interpolation.

-            raise Exception(
-                f"Global fallback image URL not found")
+            raise Exception("Global fallback image URL not found")

Silences Ruff F541.

🧰 Tools
🪛 Ruff (0.8.2)

311-311: f-string without any placeholders

Remove extraneous f prefix

(F541)

tests/unit_test/data_processor/data_processor_test.py (2)

12-12: Remove unused import.

The urllib.parse.urljoin import is not used in this file.

- from urllib.parse import urljoin
🧰 Tools
🪛 Ruff (0.8.2)

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


1287-1413: Consider using pytest fixtures for image resolution tests.

The image resolution tests have duplicated setup and teardown code. Consider using pytest fixtures to reduce code duplication and ensure cleanup even if tests fail.

@pytest.fixture
def setup_bot_sync_and_images(request):
    bot = "test_bot"
    user = "test_user"
    
    bot_sync_config = BotSyncConfig(
        parent_bot="parent_bot",
        restaurant_name="TestRestaurant",
        provider="some_provider",
        branch_name="TestBranch",
        branch_bot=bot,
        ai_enabled=True,
        meta_enabled=True,
        user=user,
        timestamp=datetime.utcnow()
    )
    bot_sync_config.save()
    
    restaurant_name, branch_name = CognitionDataProcessor.get_restaurant_and_branch_name(bot)
    catalog_images_collection = f"{restaurant_name}_{branch_name}_catalog_images"
    
    def teardown():
        BotSyncConfig.objects.delete()
        CollectionData.objects(collection_name=catalog_images_collection).delete()
    
    request.addfinalizer(teardown)
    
    return bot, user, catalog_images_collection
kairon/shared/cognition/processor.py (3)

880-895: Bulk write defaults to ordered=True – switch to unordered for large batches

With ordered=True a single bad document aborts the whole batch and hurts throughput. Unless ordering is required, use ordered=False.

-                CognitionData._get_collection().bulk_write(update_operations)
+                CognitionData._get_collection().bulk_write(update_operations, ordered=False)

919-921: Typo in error message – “fro” ➜ “from”

-                await llm_processor.__delete_collection_points__(qdrant_collection, vector_ids, "Cannot delete stale points fro Qdrant!")
+                await llm_processor.__delete_collection_points__(qdrant_collection, vector_ids, "Cannot delete stale points from Qdrant!")

977-1013: Relative path for metadata file is fragile in deployments

Hard-coding "./metadata/catalog_provider_mappings.json" fails when the working directory differs (e.g., gunicorn, cron). Prefer importlib.resources or pkg_resources to load package data relative to module path.

from importlib import resources

with resources.files("kairon.shared") / "metadata" / "catalog_provider_mappings.json" as fp:
    mapping_data = json.load(fp.open())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b67f564 and ebe1dd1.

📒 Files selected for processing (35)
  • email.yaml (1 hunks)
  • kairon/api/app/main.py (2 hunks)
  • kairon/api/app/routers/bot/data.py (5 hunks)
  • kairon/api/app/routers/bot/integrations.py (1 hunks)
  • kairon/catalog_sync/definitions/base.py (1 hunks)
  • kairon/catalog_sync/definitions/factory.py (1 hunks)
  • kairon/events/definitions/catalog_sync.py (1 hunks)
  • kairon/events/definitions/factory.py (2 hunks)
  • kairon/events/definitions/petpooja_sync.py (1 hunks)
  • kairon/meta/processor.py (1 hunks)
  • kairon/shared/account/data_objects.py (1 hunks)
  • kairon/shared/account/processor.py (2 hunks)
  • kairon/shared/catalog_sync/catalog_sync_log_processor.py (1 hunks)
  • kairon/shared/catalog_sync/data_objects.py (1 hunks)
  • kairon/shared/cognition/processor.py (5 hunks)
  • kairon/shared/constants.py (2 hunks)
  • kairon/shared/data/constant.py (3 hunks)
  • kairon/shared/data/data_models.py (1 hunks)
  • kairon/shared/data/data_objects.py (2 hunks)
  • kairon/shared/data/utils.py (1 hunks)
  • kairon/shared/llm/processor.py (1 hunks)
  • kairon/shared/models.py (1 hunks)
  • kairon/shared/utils.py (2 hunks)
  • metadata/catalog_provider_mappings.json (1 hunks)
  • template/emails/catalog_sync_status.html (1 hunks)
  • tests/integration_test/services_test.py (23 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload.json (1 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload_invalid_missing_instock.json (1 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload_invalid_missing_itemid.json (1 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload_invalid_nonboolean_instock.json (1 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_push_menu_payload.json (1 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_push_menu_payload_invalid.json (1 hunks)
  • tests/testing_data/catalog_sync/catalog_sync_push_menu_payload_with_delete_data.json (1 hunks)
  • tests/unit_test/data_processor/catalog_sync_log_processor_test.py (1 hunks)
  • tests/unit_test/data_processor/data_processor_test.py (6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
kairon/shared/data/utils.py (3)
kairon/shared/auth.py (2)
  • Authentication (30-465)
  • generate_integration_token (282-315)
kairon/shared/data/constant.py (2)
  • ACCESS_ROLES (177-184)
  • TOKEN_TYPE (200-206)
kairon/shared/utils.py (1)
  • Utility (92-2277)
kairon/shared/account/processor.py (1)
kairon/shared/utils.py (1)
  • Utility (92-2277)
kairon/shared/llm/processor.py (1)
kairon/shared/rest_client.py (3)
  • AioRestClient (26-141)
  • request (18-20)
  • request (51-69)
kairon/shared/catalog_sync/data_objects.py (2)
kairon/shared/utils.py (1)
  • push_notification (1514-1534)
kairon/shared/data/signals.py (2)
  • auditlogger (91-94)
  • apply (16-18)
kairon/meta/processor.py (2)
kairon/shared/catalog_sync/data_objects.py (1)
  • CatalogProviderMapping (29-35)
kairon/shared/rest_client.py (1)
  • AioRestClient (26-141)
tests/integration_test/services_test.py (6)
kairon/meta/processor.py (2)
  • MetaProcessor (13-122)
  • update_meta_catalog (82-101)
kairon/shared/catalog_sync/data_objects.py (2)
  • CatalogProviderMapping (29-35)
  • CatalogSyncLogs (9-24)
kairon/shared/cognition/processor.py (2)
  • CognitionDataProcessor (26-1041)
  • get_restaurant_and_branch_name (1035-1041)
kairon/shared/data/processor.py (1)
  • MongoProcessor (179-9012)
kairon/events/definitions/catalog_sync.py (1)
  • CatalogSync (13-73)
kairon/api/app/routers/bot/integrations.py (1)
  • sync_data (21-49)
🪛 Ruff (0.8.2)
kairon/shared/account/processor.py

962-962: Use a context manager for opening files

(SIM115)

kairon/catalog_sync/definitions/factory.py

23-23: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

tests/unit_test/data_processor/catalog_sync_log_processor_test.py

7-7: babel.messages.jslexer.uni_escape_re imported but unused

Remove unused import: babel.messages.jslexer.uni_escape_re

(F401)


11-11: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


15-15: kairon.shared.cognition.processor.CognitionDataProcessor imported but unused

Remove unused import: kairon.shared.cognition.processor.CognitionDataProcessor

(F401)


24-24: Use capitalized environment variable SYSTEM_FILE instead of system_file

Replace system_file with SYSTEM_FILE

(SIM112)


116-116: pytest.raises(Exception) should be considered evil

(B017)


129-129: Do not use bare except

(E722)


132-132: pytest.raises(Exception) should be considered evil

(B017)


235-235: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


246-246: pytest.raises(Exception) should be considered evil

(B017)

kairon/shared/llm/processor.py

344-345: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

kairon/shared/catalog_sync/data_objects.py

3-3: mongoengine.BooleanField imported but unused

Remove unused import

(F401)


3-3: mongoengine.ListField imported but unused

Remove unused import

(F401)

kairon/api/app/routers/bot/data.py

2-2: typing.Text imported but unused

Remove unused import: typing.Text

(F401)


4-4: fastapi.Path imported but unused

Remove unused import: fastapi.Path

(F401)


379-379: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

kairon/meta/processor.py

4-4: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


7-7: kairon.Utility imported but unused

Remove unused import: kairon.Utility

(F401)


9-9: kairon.shared.rest_client.AioRestClient imported but unused

Remove unused import: kairon.shared.rest_client.AioRestClient

(F401)

kairon/api/app/routers/bot/integrations.py

15-15: kairon.shared.utils.MailUtility imported but unused

Remove unused import: kairon.shared.utils.MailUtility

(F401)


23-24: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


25-25: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


26-26: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


27-27: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


55-56: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


57-57: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


58-58: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


59-59: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

kairon/shared/catalog_sync/catalog_sync_log_processor.py

1-1: json imported but unused

Remove unused import: json

(F401)


3-3: typing.List imported but unused

Remove unused import: typing.List

(F401)


9-9: kairon.shared.cognition.data_objects.ColumnMetadata imported but unused

Remove unused import: kairon.shared.cognition.data_objects.ColumnMetadata

(F401)


11-11: kairon.shared.content_importer.data_objects.ContentValidationLogs imported but unused

Remove unused import: kairon.shared.content_importer.data_objects.ContentValidationLogs

(F401)


15-15: kairon.shared.data.processor.MongoProcessor imported but unused

Remove unused import: kairon.shared.data.processor.MongoProcessor

(F401)


311-311: f-string without any placeholders

Remove extraneous f prefix

(F541)

kairon/events/definitions/petpooja_sync.py

3-3: dotenv.set_key imported but unused

Remove unused import: dotenv.set_key

(F401)


5-5: kairon.Utility imported but unused

Remove unused import: kairon.Utility

(F401)


6-6: loguru.logger imported but unused

Remove unused import: loguru.logger

(F401)


9-9: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


12-12: kairon.shared.constants.EventClass imported but unused

Remove unused import: kairon.shared.constants.EventClass

(F401)


14-14: kairon.shared.data.data_objects.BotSyncConfig imported but unused

Remove unused import: kairon.shared.data.data_objects.BotSyncConfig

(F401)

tests/integration_test/services_test.py

82-82: kairon.shared.data.utils.DataUtility imported but unused

Remove unused import: kairon.shared.data.utils.DataUtility

(F401)


84-84: kairon.shared.models.StoryEventType imported but unused

Remove unused import: kairon.shared.models.StoryEventType

(F401)


3810-3810: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)


3937-3937: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)


4572-4572: Local variable token is assigned to but never used

Remove assignment to unused variable token

(F841)

kairon/shared/cognition/processor.py

668-668: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


671-671: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

tests/unit_test/data_processor/data_processor_test.py

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Analyze (python)
  • GitHub Check: Python CI
🔇 Additional comments (39)
kairon/shared/account/data_objects.py (1)

158-158: New mail template field added correctly

The addition of catalog_sync_status field to the MailTemplates embedded document class is appropriate and follows the existing pattern for other email templates. This field will support the new catalog synchronization status email notifications.

tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload_invalid_missing_instock.json (1)

1-13: Well-structured test data for validation testing

This test payload correctly models an invalid catalog sync item toggle request missing the inStock field. The data structure includes all other required fields (restID, itemID array, type, etc.) needed to thoroughly test validation failures in the catalog sync API.

email.yaml (1)

28-28: Email template configuration added correctly

The addition of the catalog_sync_status_subject configuration with a proper default value ("Catalog Sync Update") follows the existing pattern for other email templates. This configuration supports the catalog synchronization notification system.

kairon/api/app/main.py (2)

26-26: New integration module import looks good

The import of the integrations router from kairon.api.app.routers.bot is correctly added to support the new catalog synchronization API endpoints.


272-272: API router properly registered

The integration router is correctly included with the appropriate prefix /api/bot/integration and tagged as "Data Integrations", following the established pattern for other API endpoints.

kairon/shared/models.py (1)

120-122: Enum renamed to better reflect its purpose

The renaming from VaultSyncEventType to VaultSyncType and field_update to item_toggle improves semantic clarity, making the code more maintainable and better aligned with the catalog synchronization feature. This change enhances readability while preserving the existing functionality.

kairon/shared/data/data_models.py (1)

1394-1397: Well-structured Meta catalog configuration model.

The MetaConfig model provides a clean interface for Meta catalog API authentication and targeting. The model appropriately defines both the access token and catalog ID as string fields, which aligns with Meta's API requirements.

tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload.json (1)

1-14: Valid catalog item toggle test payload.

The test payload provides a complete example for testing the catalog sync item toggle functionality with all required fields:

  • Restaurant ID
  • Item availability status
  • Item ID array
  • Item type
  • Custom auto turn-on time configuration

This will properly test the item toggle feature of the catalog synchronization.

tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload_invalid_missing_itemid.json (1)

1-11: Appropriate negative test case for validation.

This test payload correctly omits the required itemID field to test validation logic. It's a good practice to include negative test cases to ensure proper error handling.

kairon/events/definitions/factory.py (2)

2-2: New import for catalog sync functionality.

Adding the import for CatalogSync is appropriate for the catalog integration feature.


28-29: Factory pattern extension for catalog sync.

The factory pattern is correctly extended to support the new catalog integration event type. The mapping between EventClass.catalog_integration and CatalogSync enables the event system to instantiate the appropriate handler for catalog synchronization events.

tests/testing_data/catalog_sync/catalog_sync_item_toggle_payload_invalid_nonboolean_instock.json (1)

1-14: Test payload correctly structured for negative validation testing.

This test payload properly defines an invalid value for the inStock field (string "invalid" instead of a boolean) which will help validate the system's ability to properly handle and reject incorrectly formatted catalog sync requests.

kairon/shared/constants.py (3)

86-86: LGTM! Appropriate catalog integration event type added.

The new catalog_integration value is correctly added to the EventClass enum to support the new catalog synchronization feature.


88-89: LGTM! Provider-specific catalog sync class added.

The CatalogSyncClass enum is appropriately defined with the Petpooja provider as the initial implementation, maintaining consistency with other enum definitions in the codebase.


127-128: LGTM! Catalog provider enum added consistently.

The CatalogProvider enum is correctly defined with the PETPOOJA member, following the existing pattern of provider-specific enumerations.

kairon/shared/account/processor.py (1)

1017-1018: LGTM! Email template correctly registered.

The catalog sync status email template is properly assigned to the email configuration dictionary following the established pattern.

kairon/shared/data/utils.py (1)

400-415: LGTM: Clear implementation for generating integration endpoints.

The implementation follows a similar pattern to the existing get_channel_endpoint method, adapting it for integration endpoints with appropriate access control.

kairon/shared/data/constant.py (2)

206-206: LGTM: DATA_INTEGRATION token type addition

The addition of DATA_INTEGRATION to the TOKEN_TYPE enum is appropriate for the new catalog synchronization feature.


283-286: LGTM: SyncType enum for catalog events

The new SyncType enum with values for push_menu and item_toggle properly defines the types of catalog synchronization events.

tests/testing_data/catalog_sync/catalog_sync_push_menu_payload_invalid.json (2)

159-185: Confirm intentional missing "itemid" field

Unlike other item objects in the JSON (such as the one at line 131), this item is missing an "itemid" field. If this is intentional to test invalid data handling, consider adding a comment to make this explicit. If not, add the missing field.

This appears to be specifically crafted for testing validation logic that catches missing required fields. If this is the case, consider documenting this intent to help other developers understand the purpose of this test data.


1-269: LGTM: Comprehensive test data for validation scenarios

The test data provides a thorough representation of a restaurant menu structure with multiple entities (restaurants, categories, items, etc.) making it well-suited for testing complex validation scenarios.

kairon/shared/catalog_sync/data_objects.py (1)

29-36: LGTM: CatalogProviderMapping implementation

The CatalogProviderMapping class is well-structured with appropriate fields and a clear docstring explaining its purpose.

kairon/api/app/routers/bot/data.py (1)

346-346: Parameter renamed for consistency.

The parameter has been renamed from event_type to sync_type, which aligns with the broader refactoring of event types to sync types across the codebase.

Also applies to: 355-355, 365-365

template/emails/catalog_sync_status.html (2)

1-2: Good use of documentation comments.

The template clearly documents the required parameters and subject line, which helps with maintainability.


159-163: Well-structured email content section.

The email message body clearly presents the key information (Bot ID, Execution ID, Sync Status, Message) with proper formatting and styling.

tests/testing_data/catalog_sync/catalog_sync_push_menu_payload.json (1)

1-270: Well-structured test data for catalog sync.

This test data provides a complete representation of a restaurant's menu structure with three menu items, compared to the two items in the "with_delete_data" version. The data includes all necessary fields for testing catalog synchronization.

One observation is that the prices in this test data are quite high (8700.00, 3426.00, 3159.00) which seems unusual for restaurant items. While this doesn't affect functionality, you might want to use more realistic values in test data.

kairon/api/app/routers/bot/integrations.py (1)

36-50: Return your Response model, not a plain dict

Both handlers declare response_model=Response but return a dict.
If Response expects additional fields (success, data, …) FastAPI will coerce or fail silently.
Consider returning Response(message=..., success=True) (or whatever schema applies) to make the contract explicit.

Also applies to: 72-84

kairon/meta/processor.py (1)

38-44: Hard-coding "currency": "INR" in CREATE but omitting it in UPDATE can break Meta validation.

Pass currency consistently or rely on Meta defaults. Verify provider mapping instead of embedding a constant.

kairon/events/definitions/petpooja_sync.py (2)

25-34: Enum default may be wrong type – use .value or the enum itself consistently.

-self.sync_type = kwargs.get("sync_type", SyncType.item_toggle)
+# Accept both string and enum; default to enum member
+self.sync_type = kwargs.get("sync_type", SyncType.item_toggle.value)

Ensure subsequent comparisons (self.sync_type == SyncType.push_menu) use the same representation.


154-160: ⚠️ Potential issue

Incorrect membership test on Document – will raise TypeError.

'meta_config' in integrations_doc checks attribute names, not fields. Use attribute access.

-if integrations_doc and 'meta_config' in integrations_doc:
+if integrations_doc and integrations_doc.meta_config:

Likely an incorrect or invalid review comment.

tests/integration_test/services_test.py (1)

31-41: Prune unused internal imports flagged by Ruff
DataUtility and StoryEventType (imported a bit lower, L84) aren’t used anywhere in this file. Please delete them or reference them to silence F401.

-from kairon.shared.data.utils import DataUtility
-from kairon.shared.models import StoryEventType

The rest of the added imports look relevant to the new tests.

tests/unit_test/data_processor/data_processor_test.py (5)

1096-1223: Good test implementation for menu preprocessing.

The test comprehensively verifies the catalog preprocessing functionality with proper setup and teardown.


1224-1233: LGTM: Good error handling test.

This test properly validates that the system handles missing provider mappings correctly.


1446-1487: Excellent preservation of existing configuration.

This test correctly verifies that when adding a configuration for a bot that already has one, the existing configuration values are preserved rather than overwritten.


1488-1517: Good normalization of restaurant and branch names.

The test verifies that restaurant and branch names are properly normalized (lowercase with underscores) for consistent collection naming.


2684-2691: Update of validation method name matches refactored API.

The test has been properly updated to use _validate_sync_type instead of _validate_event_type and to reference VaultSyncType instead of VaultSyncEventType.

kairon/shared/cognition/processor.py (3)

658-668: Preserve original traceback when re-raising exceptions

Raising a new AppException without from e loses the original stack trace, making debugging harder and triggering Ruff B904.

-        except Exception as e:
-            raise AppException(f"Failed to sync document with Qdrant: {str(e)}")
+        except Exception as e:
+            # keep the causal chain for easier debugging
+            raise AppException(f"Failed to sync document with Qdrant: {str(e)}") from e

[ suggest_nitpick ]

🧰 Tools
🪛 Ruff (0.8.2)

668-668: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


670-673: Remove unnecessary .keys() lookup

__members__ already behaves like a mapping; dropping .keys() is both clearer and satisfies Ruff SIM118.

-        if sync_type not in VaultSyncType.__members__.keys():
+        if sync_type not in VaultSyncType.__members__:
             raise AppException("Sync type does not exist")

[ suggest_nitpick ]

🧰 Tools
🪛 Ruff (0.8.2)

671-671: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)


777-803: Use domain-specific AppException instead of bare Exception

resolve_image_link currently throws a generic exception (lines 788-803). Uniform exception types simplify upstream handling and API responses.

-        else:
-            raise Exception(f"Image URL not found for {item_id} in {catalog_images_collection}")
+        else:
+            raise AppException(f"Image URL not found for {item_id} in {catalog_images_collection}")

[ suggest_nitpick ]

Comment on lines +1398 to +1401
provider: str
config: dict
meta_config: Optional[MetaConfig]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify configuration validation for POS providers.

The POSIntegrationRequest model looks well-structured with appropriate fields for provider integration. However, there's no validation on the provider field to ensure it's a supported provider type.

Consider adding validation to ensure the provider is supported:

+@validator("provider")
+def validate_provider(cls, v, values, **kwargs):
+    from kairon.shared.utils import Utility
+    
+    # Assuming you have a list of supported providers
+    supported_providers = ["petpooja", "other_provider"]  # Update with actual providers
+    if v not in supported_providers:
+        raise ValueError(f"Unsupported provider. Supported providers: {supported_providers}")
+    return v

Let's check if there are supported providers defined:


🏁 Script executed:

#!/bin/bash
# Search for supported POS providers in the codebase
rg -t py "supported_providers|SUPPORTED_PROVIDERS|provider.*validation" --no-filename

Length of output: 201


I couldn't find an existing list of supported POS providers in the repo—let’s check if Utility already exposes anything for this before adding a new constant.


🏁 Script executed:

#!/bin/bash
# Locate the Utility class source
UTILITY_FILE=$(rg -l 'class Utility' -t py)
echo "Utility file: $UTILITY_FILE"

# Search for any provider- or supported-related methods/attributes
rg -n 'supported|provider|pos' "$UTILITY_FILE"

Length of output: 1024


Let’s inspect the business_providers block in Utility to see if a list of supported POS providers is already defined:


🏁 Script executed:

#!/bin/bash
# Show lines around where business_providers is referenced
sed -n '1570,1620p' kairon/shared/utils.py

Length of output: 2057


Let’s locate where system_metadata is initialized in Utility to see if POS providers are defined there:


🏁 Script executed:

#!/bin/bash
# Show the top of the Utility class (including system_metadata)
sed -n '1,200p' kairon/shared/utils.py

Length of output: 6642


Let’s locate where Utility.system_metadata gets populated and look for any POS provider lists or metadata files:

#!/bin/bash
# Find where system_metadata is assigned or loaded
rg -n 'Utility\.system_metadata' -t py

# Search for ConfigLoader usage to load any metadata
rg -n 'ConfigLoader' -t py

# Look for any YAML/JSON files that mention POS or providers
rg -l '\.ya?ml$|\.json$' -t yml -t json | xargs rg -n 'pos|provider'

To locate any existing provider lists or metadata definitions, let’s search where system_metadata is populated and inspect YAML/JSON configs:


🏁 Script executed:

#!/bin/bash
# 1. Find where system_metadata is assigned or loaded
rg -n 'system_metadata\s*=' -t py

# 2. See where ConfigLoader is used (likely to load metadata files)
rg -n 'ConfigLoader' -t py

# 3. Scan all YAML/JSON files for “pos” or “provider” entries
fd -e yml -e yaml -e json . | xargs rg -n 'pos|provider'

Length of output: 62328


Let’s inspect the POS provider definitions in the integrations metadata:


🏁 Script executed:

#!/bin/bash
# Show lines around the business_providers section in integrations metadata
rg -n 'business_providers' -C5 metadata/integrations.yml

Length of output: 332


Add provider validation to POSIntegrationRequest

The POSIntegrationRequest model currently accepts any string for provider, which may allow unsupported integrations. Since there’s no existing list of POS providers in the codebase, please:

  • Define a constant list of supported POS providers (for example, in a new kairon/shared/constants.py or at the top of data_models.py):

    # kairon/shared/constants.py
    SUPPORTED_POS_PROVIDERS = ["petpooja", "other_provider"]  # update with real values
  • Add a Pydantic validator to enforce this in kairon/shared/data/data_models.py (around lines 1398–1401):

     class POSIntegrationRequest(BaseModel):
         provider: str
         config: dict
         meta_config: Optional[MetaConfig]
    
     +   @validator("provider")
     +   def validate_provider(cls, v):
     +       from kairon.shared.constants import SUPPORTED_POS_PROVIDERS
     +
     +       if v not in SUPPORTED_POS_PROVIDERS:
     +           raise ValueError(
     +               f"Unsupported provider '{v}'. Supported providers: {SUPPORTED_POS_PROVIDERS}"
     +           )
     +       return v

This ensures only known POS integrations are accepted at the model level.

@@ -959,6 +959,7 @@ def load_system_properties():
).read(),
button_template=open("template/emails/button.html", "r").read(),
leave_bot_owner_notification=open("template/emails/leaveBotOwnerNotification.html", "r").read(),
catalog_sync_status=open("template/emails/catalog_sync_status.html", "r").read(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use context manager for file operations.

File operations should use context managers (with statement) to ensure proper resource cleanup even if exceptions occur.

- catalog_sync_status=open("template/emails/catalog_sync_status.html", "r").read(),
+ catalog_sync_status=open("template/emails/catalog_sync_status.html", "r").read(),

Apply this pattern to all file operations for consistency:

with open("template/emails/catalog_sync_status.html", "r") as f:
    catalog_sync_status = f.read()
🧰 Tools
🪛 Ruff (0.8.2)

962-962: Use a context manager for opening files

(SIM115)

Comment on lines +333 to +349
async def __delete_collection_points__(self, collection_name: Text, point_ids: List, err_msg: Text,
raise_err=True):
client = AioRestClient()
response = await client.request(http_url=urljoin(self.db_url, f"/collections/{collection_name}/points/delete"),
request_method="POST",
headers=self.headers,
request_body={"points": point_ids},
return_json=True,
timeout=5)
print(response)

if not response.get('result'):
if "status" in response:
logging.exception(response['status'].get('error'))
if raise_err:
raise AppException(err_msg)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove debug print statement.

The implementation for deleting collection points follows the error handling pattern used elsewhere, but includes a debug print statement.

Apply this diff to remove the debug print and simplify conditional logic:

async def __delete_collection_points__(self, collection_name: Text, point_ids: List, err_msg: Text,
                                       raise_err=True):
    client = AioRestClient()
    response = await client.request(http_url=urljoin(self.db_url, f"/collections/{collection_name}/points/delete"),
                                    request_method="POST",
                                    headers=self.headers,
                                    request_body={"points": point_ids},
                                    return_json=True,
                                    timeout=5)
-   print(response)

-   if not response.get('result'):
-       if "status" in response:
+   if not response.get('result') and "status" in response:
        logging.exception(response['status'].get('error'))
        if raise_err:
            raise AppException(err_msg)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def __delete_collection_points__(self, collection_name: Text, point_ids: List, err_msg: Text,
raise_err=True):
client = AioRestClient()
response = await client.request(http_url=urljoin(self.db_url, f"/collections/{collection_name}/points/delete"),
request_method="POST",
headers=self.headers,
request_body={"points": point_ids},
return_json=True,
timeout=5)
print(response)
if not response.get('result'):
if "status" in response:
logging.exception(response['status'].get('error'))
if raise_err:
raise AppException(err_msg)
async def __delete_collection_points__(self, collection_name: Text, point_ids: List, err_msg: Text,
raise_err=True):
client = AioRestClient()
response = await client.request(
http_url=urljoin(self.db_url, f"/collections/{collection_name}/points/delete"),
request_method="POST",
headers=self.headers,
request_body={"points": point_ids},
return_json=True,
timeout=5,
)
if not response.get("result") and "status" in response:
logging.exception(response["status"].get("error"))
if raise_err:
raise AppException(err_msg)
🧰 Tools
🪛 Ruff (0.8.2)

344-345: Use a single if statement instead of nested if statements

Combine if statements using and

(SIM102)

Comment on lines +22 to +23
"default": "https://www.kairon.com/default-image.jpg"
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify default image URL exists and is accessible

The default image URL "https://www.kairon.com/default-image.jpg" may not exist or be publicly accessible. This could cause issues in production if the source is missing and the default is used.


🏁 Script executed:

#!/bin/bash
# Check if the default image URL returns a valid response
curl -s -I "https://www.kairon.com/default-image.jpg" | head -n 1

Length of output: 88


Default image URL is returning 404 – please update or host a valid default image

The configured default image URL (https://www.kairon.com/default-image.jpg) currently returns HTTP 404 and will result in broken images wherever it’s used. To resolve this:

• Confirm the correct URL (or path) for your intended default image.
• Host the image at a publicly accessible location (e.g., your CDN or asset server).
• Update metadata/catalog_provider_mappings.json (lines 22–23) with the working URL.

Once replaced, verify again:

curl -s -I "https://www.kairon.com/default-image.jpg" | head -n1
# Expect: HTTP/1.1 200 OK

Comment on lines +9 to +23
class CatalogSyncLogs(DynamicDocument):
execution_id = StringField(required=True, unique=True)
raw_payload = DictField(required=True)
processed_payload = DictField(default=None)
validation_errors = DictField(default={})
exception = StringField(default="")
bot = StringField(required=True)
user = StringField(required=True)
provider = StringField(required=True)
sync_type = StringField(required=True)
start_timestamp = DateTimeField(default=datetime.utcnow)
end_timestamp = DateTimeField(default=None)
status = StringField(default=None)
sync_status = StringField(default="COMPLETED")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add docstring and clarify status fields in CatalogSyncLogs

The CatalogSyncLogs class lacks a docstring explaining its purpose, and it has two potentially redundant fields: status and sync_status.

  1. Add a class docstring similar to the one in CatalogProviderMapping
  2. Clarify the distinction between status and sync_status either through field comments or by renaming one of them to better indicate its purpose

Example docstring:

"""
Stores logs and status information for catalog synchronization operations.
"""

Comment on lines +41 to +45
try:
doc = CatalogSyncLogs.objects(bot=bot).filter(
Q(sync_status__ne=SYNC_STATUS.COMPLETED.value) &
Q(sync_status__ne=SYNC_STATUS.FAILED.value)).get()
except DoesNotExist:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

add_log may raise MultipleObjectsReturned and ignores ABORTED status

get() expects exactly one match; if two “in-progress” logs exist, it will raise MultipleObjectsReturned and crash.
Also, a sync in ABORTED state is considered “open” here but not in is_sync_in_progress, creating inconsistent behaviour.

-        doc = CatalogSyncLogs.objects(bot=bot).filter(
-            Q(sync_status__ne=SYNC_STATUS.COMPLETED.value) &
-            Q(sync_status__ne=SYNC_STATUS.FAILED.value)).get()
+        doc = (
+            CatalogSyncLogs.objects(bot=bot)
+            .filter(
+                Q(sync_status__ne=SYNC_STATUS.COMPLETED.value)
+                & Q(sync_status__ne=SYNC_STATUS.FAILED.value)
+                & Q(sync_status__ne=SYNC_STATUS.ABORTED.value)
+            )
+            .order_by("-start_timestamp")
+            .first()
+        )

first() makes the call tolerant to duplicates while still picking the latest.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
doc = CatalogSyncLogs.objects(bot=bot).filter(
Q(sync_status__ne=SYNC_STATUS.COMPLETED.value) &
Q(sync_status__ne=SYNC_STATUS.FAILED.value)).get()
except DoesNotExist:
try:
doc = (
CatalogSyncLogs.objects(bot=bot)
.filter(
Q(sync_status__ne=SYNC_STATUS.COMPLETED.value)
& Q(sync_status__ne=SYNC_STATUS.FAILED.value)
& Q(sync_status__ne=SYNC_STATUS.ABORTED.value)
)
.order_by("-start_timestamp")
.first()
)
except DoesNotExist:

Comment on lines 924 to 975
@staticmethod
def save_ai_data(processed_data: dict, bot: str, user: str, sync_type: str):
"""
Syncs a document with Qdrant vector database by generating embeddings and upserting them.
Save each item in `kv` of the processed payload into CollectionData individually.
"""
restaurant_name, branch_name = CognitionDataProcessor.get_restaurant_and_branch_name(bot)
catalog_data_collection = f"{restaurant_name}_{branch_name}_catalog_data"

kv_items = processed_data.get("kv", [])
incoming_data_map = {item["id"]: item for item in kv_items}
incoming_ids = set(incoming_data_map.keys())

existing_docs = CollectionData.objects(
collection_name=catalog_data_collection,
bot=bot,
status=True
)
existing_data_map = {doc.data.get("id"): doc for doc in existing_docs}
existing_ids = set(existing_data_map.keys())

for item_id, item in incoming_data_map.items():
if item_id in existing_data_map:
doc = existing_data_map[item_id]
if sync_type == SyncType.item_toggle:
for key, value in item.items():
doc.data[key] = value
else:
doc.data = item
doc.timestamp = datetime.utcnow()
doc.user = user
doc.save()
else:
CollectionData(
collection_name=catalog_data_collection,
data=item,
user=user,
bot=bot,
timestamp=datetime.utcnow(),
status=True
).save()
stale_ids = []
if sync_type == SyncType.push_menu:
stale_ids = list(existing_ids - incoming_ids)
if stale_ids:
CollectionData.objects(
collection_name=catalog_data_collection,
bot=bot,
status=True,
data__id__in=stale_ids
).delete()

return stale_ids
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enum misuse – save_ai_data compares sync_type to enum values incorrectly

The method expects sync_type: str but compares it against SyncType.item_toggle & SyncType.push_menu (enum members). This always fails, so item-toggle updates and stale-ID deletion logic won’t run.

Either:

-                if sync_type == SyncType.item_toggle:
+                if sync_type == SyncType.item_toggle.name:

and similarly for push_menu, or accept an actual SyncType enum instance.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def save_ai_data(processed_data: dict, bot: str, user: str, sync_type: str):
"""
Syncs a document with Qdrant vector database by generating embeddings and upserting them.
Save each item in `kv` of the processed payload into CollectionData individually.
"""
restaurant_name, branch_name = CognitionDataProcessor.get_restaurant_and_branch_name(bot)
catalog_data_collection = f"{restaurant_name}_{branch_name}_catalog_data"
kv_items = processed_data.get("kv", [])
incoming_data_map = {item["id"]: item for item in kv_items}
incoming_ids = set(incoming_data_map.keys())
existing_docs = CollectionData.objects(
collection_name=catalog_data_collection,
bot=bot,
status=True
)
existing_data_map = {doc.data.get("id"): doc for doc in existing_docs}
existing_ids = set(existing_data_map.keys())
for item_id, item in incoming_data_map.items():
if item_id in existing_data_map:
doc = existing_data_map[item_id]
if sync_type == SyncType.item_toggle:
for key, value in item.items():
doc.data[key] = value
else:
doc.data = item
doc.timestamp = datetime.utcnow()
doc.user = user
doc.save()
else:
CollectionData(
collection_name=catalog_data_collection,
data=item,
user=user,
bot=bot,
timestamp=datetime.utcnow(),
status=True
).save()
stale_ids = []
if sync_type == SyncType.push_menu:
stale_ids = list(existing_ids - incoming_ids)
if stale_ids:
CollectionData.objects(
collection_name=catalog_data_collection,
bot=bot,
status=True,
data__id__in=stale_ids
).delete()
return stale_ids
@staticmethod
def save_ai_data(processed_data: dict, bot: str, user: str, sync_type: str):
"""
Save each item in `kv` of the processed payload into CollectionData individually.
"""
restaurant_name, branch_name = CognitionDataProcessor.get_restaurant_and_branch_name(bot)
catalog_data_collection = f"{restaurant_name}_{branch_name}_catalog_data"
kv_items = processed_data.get("kv", [])
incoming_data_map = {item["id"]: item for item in kv_items}
incoming_ids = set(incoming_data_map.keys())
existing_docs = CollectionData.objects(
collection_name=catalog_data_collection,
bot=bot,
status=True
)
existing_data_map = {doc.data.get("id"): doc for doc in existing_docs}
existing_ids = set(existing_data_map.keys())
for item_id, item in incoming_data_map.items():
if item_id in existing_data_map:
doc = existing_data_map[item_id]
# use the enum member’s name for comparison
if sync_type == SyncType.item_toggle.name:
for key, value in item.items():
doc.data[key] = value
else:
doc.data = item
doc.timestamp = datetime.utcnow()
doc.user = user
doc.save()
else:
CollectionData(
collection_name=catalog_data_collection,
data=item,
user=user,
bot=bot,
timestamp=datetime.utcnow(),
status=True
).save()
stale_ids = []
if sync_type == SyncType.push_menu.name:
stale_ids = list(existing_ids - incoming_ids)
if stale_ids:
CollectionData.objects(
collection_name=catalog_data_collection,
bot=bot,
status=True,
data__id__in=stale_ids
).delete()
return stale_ids

Comment on lines 849 to 855
vector_id = str(uuid.uuid4()) if not existing_document else existing_document.get("vector_id")

merged_data = row
if existing_document:
existing_data = existing_document.get("data", {})
merged_data = {**existing_data, **row}
update_operations.append(UpdateOne(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle documents missing vector_id to avoid Qdrant upsert failures

Legacy records may not contain vector_id; appending None (line 850) results in {'id': None, ...} and an HTTP 400 from Qdrant.

-                vector_id = str(uuid.uuid4()) if not existing_document else existing_document.get("vector_id")
+                if existing_document and existing_document.get("vector_id"):
+                    vector_id = existing_document["vector_id"]
+                else:
+                    vector_id = str(uuid.uuid4())
+                    # back-fill vector_id on the existing Mongo document if previously absent
+                    if existing_document:
+                        update_operations.append(
+                            UpdateOne({"_id": existing_document["_id"]},
+                                      {"$set": {"vector_id": vector_id}})
+                        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
vector_id = str(uuid.uuid4()) if not existing_document else existing_document.get("vector_id")
merged_data = row
if existing_document:
existing_data = existing_document.get("data", {})
merged_data = {**existing_data, **row}
update_operations.append(UpdateOne(
if existing_document and existing_document.get("vector_id"):
vector_id = existing_document["vector_id"]
else:
vector_id = str(uuid.uuid4())
# back-fill vector_id on the existing Mongo document if previously absent
if existing_document:
update_operations.append(
UpdateOne({"_id": existing_document["_id"]},
{"$set": {"vector_id": vector_id}})
)
merged_data = row
if existing_document:
existing_data = existing_document.get("data", {})
merged_data = {**existing_data, **row}
update_operations.append(UpdateOne(

Comment on lines +688 to +706
self._validate_sync_type(sync_type)
try:
integration = POSIntegrations.objects(bot= bot, provider = configuration['provider'], sync_type = sync_type).get()
integration.config = configuration['config']
integration.meta_config = configuration['meta_config']
except DoesNotExist:
integration = POSIntegrations(**configuration)
integration.bot = bot
integration.user = user
integration.sync_type = sync_type
integration.timestamp = datetime.utcnow()

if 'meta_config' in configuration:
integration.meta_config = configuration['meta_config']

integration.save()
integration_endpoint = DataUtility.get_integration_endpoint(integration)
return integration_endpoint

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

sync_type is optional but validated as mandatory – leads to false failures

save_pos_integration_config declares sync_type: Text = None, yet immediately calls _validate_sync_type(sync_type) (line 688) which raises when sync_type is None. Either make the argument mandatory or guard the validation.

-        self._validate_sync_type(sync_type)
+        if sync_type:
+            self._validate_sync_type(sync_type)
+        else:
+            raise AppException("sync_type must be provided for POS integrations")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
self._validate_sync_type(sync_type)
try:
integration = POSIntegrations.objects(bot= bot, provider = configuration['provider'], sync_type = sync_type).get()
integration.config = configuration['config']
integration.meta_config = configuration['meta_config']
except DoesNotExist:
integration = POSIntegrations(**configuration)
integration.bot = bot
integration.user = user
integration.sync_type = sync_type
integration.timestamp = datetime.utcnow()
if 'meta_config' in configuration:
integration.meta_config = configuration['meta_config']
integration.save()
integration_endpoint = DataUtility.get_integration_endpoint(integration)
return integration_endpoint
if sync_type:
self._validate_sync_type(sync_type)
else:
raise AppException("sync_type must be provided for POS integrations")
try:
integration = POSIntegrations.objects(bot= bot, provider = configuration['provider'], sync_type = sync_type).get()
integration.config = configuration['config']
integration.meta_config = configuration['meta_config']
except DoesNotExist:
integration = POSIntegrations(**configuration)
integration.bot = bot
integration.user = user
integration.sync_type = sync_type
integration.timestamp = datetime.utcnow()
if 'meta_config' in configuration:
integration.meta_config = configuration['meta_config']
integration.save()
integration_endpoint = DataUtility.get_integration_endpoint(integration)
return integration_endpoint

Comment on lines +708 to +757
@staticmethod
def preprocess_push_menu_data(bot, json_data, provider):
"""
Preprocess the JSON data received from Petpooja to extract relevant fields for knowledge base or meta synchronization.
Handles different event types ("push_menu" vs others) and uses metadata to drive the field extraction and defaulting.
"""
doc = CatalogProviderMapping.objects(provider=provider).first()
if not doc:
raise Exception(f"Metadata mappings not found for provider={provider}")

category_map = {
cat["categoryid"]: cat["categoryname"]
for cat in json_data.get("categories", [])
}

provider_mappings = {
"meta": doc.meta_mappings,
"kv": doc.kv_mappings
}

data = {sync_target: [] for sync_target in provider_mappings}
for item in json_data.get("items", []):
for sync_target, fields in provider_mappings.items():
transformed_item = {"id": item["itemid"]}

for target_field, field_config in fields.items():
source_key = field_config.get("source")
default_value = field_config.get("default")
value = item.get(source_key) if source_key else None

if target_field == "availability":
value = "in stock" if int(value or 0) > 0 else default_value
elif target_field == "facebook_product_category":
category_id = value or ""
value = f"Food and drink > {category_map.get(category_id, 'General')}"
elif target_field == "image_url":
value = CognitionDataProcessor.resolve_image_link(bot, item["itemid"])
elif target_field == "price":
value = float(value)
if not value:
value = default_value

transformed_item[target_field] = value

data[sync_target].append(transformed_item)

return data

@staticmethod
def preprocess_item_toggle_data(bot, json_data, provider):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden JSON preprocessing – unguarded casts may crash on real-world data

  • float(value) (line 746) will raise TypeError/ValueError when value is None, empty or non-numeric.
  • int(value or 0) (line 739) still raises if value is ''.
  • Generic Exception on missing mappings (line 716) breaks uniform error-handling.

Consider:

-                    if target_field == "availability":
-                        value = "in stock" if int(value or 0) > 0 else default_value
+                    if target_field == "availability":
+                        try:
+                            value = "in stock" if int(value or 0) > 0 else default_value
+                        except (TypeError, ValueError):
+                            value = default_value
@@
-                    elif target_field == "price":
-                        value = float(value)
+                    elif target_field == "price":
+                        try:
+                            value = float(value)
+                        except (TypeError, ValueError):
+                            value = default_value

Also replace bare Exception (line 716) with AppException for consistency.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
def preprocess_push_menu_data(bot, json_data, provider):
"""
Preprocess the JSON data received from Petpooja to extract relevant fields for knowledge base or meta synchronization.
Handles different event types ("push_menu" vs others) and uses metadata to drive the field extraction and defaulting.
"""
doc = CatalogProviderMapping.objects(provider=provider).first()
if not doc:
raise Exception(f"Metadata mappings not found for provider={provider}")
category_map = {
cat["categoryid"]: cat["categoryname"]
for cat in json_data.get("categories", [])
}
provider_mappings = {
"meta": doc.meta_mappings,
"kv": doc.kv_mappings
}
data = {sync_target: [] for sync_target in provider_mappings}
for item in json_data.get("items", []):
for sync_target, fields in provider_mappings.items():
transformed_item = {"id": item["itemid"]}
for target_field, field_config in fields.items():
source_key = field_config.get("source")
default_value = field_config.get("default")
value = item.get(source_key) if source_key else None
if target_field == "availability":
value = "in stock" if int(value or 0) > 0 else default_value
elif target_field == "facebook_product_category":
category_id = value or ""
value = f"Food and drink > {category_map.get(category_id, 'General')}"
elif target_field == "image_url":
value = CognitionDataProcessor.resolve_image_link(bot, item["itemid"])
elif target_field == "price":
value = float(value)
if not value:
value = default_value
transformed_item[target_field] = value
data[sync_target].append(transformed_item)
return data
@staticmethod
def preprocess_item_toggle_data(bot, json_data, provider):
for target_field, field_config in fields.items():
source_key = field_config.get("source")
default_value = field_config.get("default")
value = item.get(source_key) if source_key else None
if target_field == "availability":
- value = "in stock" if int(value or 0) > 0 else default_value
+ try:
+ value = "in stock" if int(value or 0) > 0 else default_value
+ except (TypeError, ValueError):
+ value = default_value
elif target_field == "facebook_product_category":
category_id = value or ""
value = f"Food and drink > {category_map.get(category_id, 'General')}"
elif target_field == "image_url":
value = CognitionDataProcessor.resolve_image_link(bot, item["itemid"])
elif target_field == "price":
- value = float(value)
+ try:
+ value = float(value)
+ except (TypeError, ValueError):
+ value = default_value
if not value:
value = default_value
transformed_item[target_field] = value

Removed unused imports
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (16)
kairon/api/app/routers/bot/data.py (1)

384-385: 🛠️ Refactor suggestion

⚠️ Potential issue

Add defensive error-handling around provider-mapping load

load_catalog_provider_mappings() performs I/O and DB writes; any failure will bubble up as an unhandled 500.
Wrap the call so the client receives a clear 4xx/5xx with context and the task can be retried gracefully.

-    CognitionDataProcessor.load_catalog_provider_mappings()
+    try:
+        CognitionDataProcessor.load_catalog_provider_mappings()
+    except Exception as e:
+        logger.exception("Failed to load catalog provider mappings")
+        raise AppException(f"Unable to load catalog provider mappings: {e}") from e
kairon/api/app/routers/bot/integrations.py (2)

22-27: examplesexample and B008 (Path/Security in defaults) persist
The same issue was raised earlier but remains unresolved. FastAPI ignores examples, and evaluating Path/Security at import time is discouraged (Ruff B008).

🧰 Tools
🪛 Ruff (0.8.2)

22-23: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


24-24: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


25-25: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


26-26: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


33-34: Harden JSON parsing

await request.json() will raise JSONDecodeError and return a 500 on malformed
payloads. Catch the error and return a 400 with a helpful message.

kairon/meta/processor.py (3)

18-45: Two long-standing issues still present

  1. self.processed_data keeps growing – calling preprocess_data twice on the same instance duplicates previous items and inflates memory usage.
    Reset it at the start of each call.

  2. Prices are coerced to int, silently discarding paise/cent values and violating Meta’s catalog spec.

-        for item in data:
+        # Start fresh each invocation
+        self.processed_data = []
+        for item in data:
@@
-                        value = int(item[field]) if field == "price" else item[field]
+                        if field == "price":
+                            value = str(item[field])          # keep decimals
+                        else:
+                            value = item[field]

105-108: Hard-coded catalog ID breaks multi-tenant support

delete_meta_catalog ignores self.catalog_id and points to a fixed ID,
causing deletions in the wrong account.

-base_url = "https://graph.facebook.com/v21.0/1880697869060042/batch"
+base_url = f"https://graph.facebook.com/v21.0/{self.catalog_id}/batch"

63-75: 🛠️ Refactor suggestion

Batch payload sent via URL – risk of URI-too-long & logging PII

The entire JSON batch is URL-encoded and appended to the query string.
Large menus quickly exceed the 2 KB/8 KB limits on many CDNs and leak
potentially sensitive catalog data into access logs.

POST the payload in the body instead:

-req = quote(json.dumps(self.processed_data))
-base_url = f"https://graph.facebook.com/v21.0/{self.catalog_id}/batch"
-url = f"{base_url}?item_type=PRODUCT_ITEM&requests={req}"
-
-data = {"access_token": self.access_token}
-response = await asyncio.to_thread(requests.post, url, headers={}, data=data)
+payload = {
+    "item_type": "PRODUCT_ITEM",
+    "requests": json.dumps(self.processed_data),
+    "access_token": self.access_token,
+}
+base_url = f"https://graph.facebook.com/v21.0/{self.catalog_id}/batch"
+response = await asyncio.to_thread(
+    requests.post, base_url, data=payload, timeout=30
+)

Also add a sensible timeout and reuse a common headers dict if needed.

kairon/events/definitions/petpooja_sync.py (2)

81-82: CognitionDataProcessor.save_ai_data return value can be None; guard before iteration.

You later pass stale_primary_keys to Meta deletion; a None will raise TypeError.

stale_primary_keys = CognitionDataProcessor.save_ai_data(self.data, self.bot, self.user, self.sync_type)
+stale_primary_keys = stale_primary_keys or []

159-162: ⚠️ Potential issue

Ensure stale_primary_keys is not None before checking

The code doesn't check if stale_primary_keys is None before using it in a conditional.

This could lead to a TypeError if stale_primary_keys is None. Apply this fix:

-                    if stale_primary_keys:
+                    if stale_primary_keys and len(stale_primary_keys) > 0:
                         delete_payload = meta_processor.preprocess_delete_data(stale_primary_keys)
                         await meta_processor.delete_meta_catalog(delete_payload)
kairon/shared/catalog_sync/catalog_sync_log_processor.py (4)

37-41: add_log may raise MultipleObjectsReturned and ignores ABORTED status

get() expects exactly one match; if two "in-progress" logs exist, it will raise MultipleObjectsReturned and crash.
Also, a sync in ABORTED state is considered "open" here but not in is_sync_in_progress, creating inconsistent behaviour.

-        doc = CatalogSyncLogs.objects(bot=bot).filter(
-            Q(sync_status__ne=SYNC_STATUS.COMPLETED.value) &
-            Q(sync_status__ne=SYNC_STATUS.FAILED.value)).get()
+        doc = (
+            CatalogSyncLogs.objects(bot=bot)
+            .filter(
+                Q(sync_status__ne=SYNC_STATUS.COMPLETED.value)
+                & Q(sync_status__ne=SYNC_STATUS.FAILED.value)
+                & Q(sync_status__ne=SYNC_STATUS.ABORTED.value)
+            )
+            .order_by("-start_timestamp")
+            .first()
+        )

94-106: is_limit_exceeded crashes when BotSettings is missing

BotSettings.objects(bot=bot).get() raises DoesNotExist; the except block is absent, so the caller gets an unhandled error unrelated to "limit exceeded".

Suggestion:

-        if doc_count >= BotSettings.objects(bot=bot).get().catalog_sync_limit_per_day:
+        try:
+            limit = BotSettings.objects(bot=bot).get().catalog_sync_limit_per_day
+        except DoesNotExist:
+            limit = 0   # or a sensible default
+
+        if doc_count >= limit:

186-196: validate_item_ids promises a return value but returns nothing

Docstring: "Returns a set of valid category IDs."
Implementation: performs validation only.

Either return the set or fix the docstring. Returning is usually more useful:

-        for item in json_data.get("items", []):
+        category_ids = set()
+        for item in json_data.get("items", []):
             if "itemid" not in item:
                 raise Exception(f"Missing 'itemid' in item: {item}")
+            if "item_categoryid" in item:
+                category_ids.add(item["item_categoryid"])
+        return category_ids

276-293: ⚠️ Potential issue

Newly-created fallback image document immediately fails validation

The method inserts a document with an empty image_url, then validates and inevitably raises "Global fallback image URL not found".

Apply this fix to prevent validation failure on newly created documents:

     if not CollectionData.objects(bot=bot, collection_name=catalog_images_collection).first():
         global_fallback_data = {
             "image_type": "global",
-            "image_url":"",
+            "image_url": "https://example.com/placeholder.jpg",  # Use a placeholder or default URL
             "image_base64":""
         }
         CollectionData(
             collection_name=catalog_images_collection,
             data=global_fallback_data,
             user=user,
             bot=bot,
             status=True,
             timestamp=datetime.utcnow()
         ).save()
+        return  # First-time setup complete; let the caller fill the actual URL later
kairon/shared/cognition/processor.py (4)

822-833: vector_id can still be None, breaking Qdrant upserts

Previous reviews pointed this out; the logic here hasn’t changed.
existing_document.get("vector_id") returns None for legacy docs, producing invalid points ({'id': None, …}) that Qdrant rejects.

See earlier suggestion for generating a UUID and back-filling MongoDB when a stored vector_id is absent.


920-940: ⚠️ Potential issue

String-vs-enum comparison prevents save_ai_data branching

sync_type is annotated as str, but comparisons are made against SyncType enum members.
These checks always evaluate to False, so item-toggle merging and push-menu clean-up never execute.

-                if sync_type == SyncType.item_toggle:
+                if sync_type == SyncType.item_toggle.name:

 ...

-        if sync_type == SyncType.push_menu:
+        if sync_type == SyncType.push_menu.name:

This bug silently skips critical logic and causes stale data to linger.


661-672: ⚠️ Potential issue

Optional sync_type parameter crashes on None

save_pos_integration_config treats sync_type as optional (None default) yet immediately validates it.
Calling _validate_sync_type(None) raises AppException, so every call that omits the argument fails.

-        self._validate_sync_type(sync_type)
+        if not sync_type:
+            raise AppException("`sync_type` must be provided for POS integration configuration")
+        self._validate_sync_type(sync_type)

Alternatively, drop the default and make the argument mandatory in the signature.
Failure to address this will break the ingestion flow at runtime.


711-720: 🛠️ Refactor suggestion

Unprotected numeric casts may raise on real-world data

float(value) and int(value or 0) assume valid numeric strings. Empty strings, None, or malformed numbers from Petpooja will raise TypeError/ValueError, aborting the entire batch.

-                    elif target_field == "availability":
-                        value = "in stock" if int(value or 0) > 0 else default_value
+                    elif target_field == "availability":
+                        try:
+                            value = "in stock" if int(value or 0) > 0 else default_value
+                        except (TypeError, ValueError):
+                            value = default_value
@@
-                    elif target_field == "price":
-                        value = float(value)
+                    elif target_field == "price":
+                        try:
+                            value = float(value)
+                        except (TypeError, ValueError):
+                            value = default_value

Guarding these casts prevents unexpected 500s during ingestion.

🧹 Nitpick comments (6)
kairon/api/app/routers/bot/integrations.py (1)

17-18: Remove unused variable

cognition_processor is instantiated but never used in this module – delete it to avoid dead code and reader confusion.

kairon/meta/processor.py (1)

15-17: Unused headers attribute

self.headers is never referenced; either populate and reuse it for the
requests.post calls or remove it to keep the class lean.

tests/unit_test/data_processor/data_processor_test.py (2)

12-12: Remove unused import.

The urljoin function from urllib.parse is imported but not used anywhere in the file.

-from urllib.parse import urljoin
🧰 Tools
🪛 Ruff (0.8.2)

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


1961-1961: Remove extraneous f-prefix from string without placeholders.

The f-string doesn't contain any placeholders, so the 'f' prefix is unnecessary.

-with pytest.raises(AppException, match=f"Sync type does not exist"):
+with pytest.raises(AppException, match="Sync type does not exist"):
🧰 Tools
🪛 Ruff (0.8.2)

1961-1961: f-string without any placeholders

Remove extraneous f prefix

(F541)

kairon/shared/cognition/processor.py (2)

644-646: Remove redundant .keys() when checking enum membership

dict.__members__ already behaves like a set-like view. Using .keys() creates an unnecessary intermediate list and triggers the Ruff SIM118 warning.

-        if sync_type not in VaultSyncType.__members__.keys():
+        if sync_type not in VaultSyncType.__members__:
             raise AppException("Sync type does not exist")
🧰 Tools
🪛 Ruff (0.8.2)

644-644: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)


892-893: Typo in error message

Minor but worth fixing for clarity.

-                await llm_processor.__delete_collection_points__(qdrant_collection, vector_ids, "Cannot delete stale points fro Qdrant!")
+                await llm_processor.__delete_collection_points__(qdrant_collection, vector_ids, "Cannot delete stale points from Qdrant!")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebe1dd1 and e43c096.

📒 Files selected for processing (9)
  • kairon/api/app/routers/bot/data.py (4 hunks)
  • kairon/api/app/routers/bot/integrations.py (1 hunks)
  • kairon/events/definitions/petpooja_sync.py (1 hunks)
  • kairon/meta/processor.py (1 hunks)
  • kairon/shared/catalog_sync/catalog_sync_log_processor.py (1 hunks)
  • kairon/shared/catalog_sync/data_objects.py (1 hunks)
  • kairon/shared/cognition/processor.py (5 hunks)
  • tests/unit_test/action/action_test.py (2 hunks)
  • tests/unit_test/data_processor/data_processor_test.py (25 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/unit_test/action/action_test.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • kairon/shared/catalog_sync/data_objects.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/unit_test/data_processor/data_processor_test.py (4)
kairon/shared/catalog_sync/data_objects.py (1)
  • CatalogProviderMapping (29-35)
kairon/shared/data/data_objects.py (1)
  • BotSyncConfig (1108-1119)
kairon/shared/models.py (1)
  • VaultSyncType (120-122)
kairon/shared/cognition/processor.py (8)
  • CognitionDataProcessor (26-1014)
  • get_restaurant_and_branch_name (1008-1014)
  • preprocess_push_menu_data (682-727)
  • preprocess_item_toggle_data (730-747)
  • resolve_image_link (751-775)
  • add_bot_sync_config (988-1005)
  • validate_data (487-576)
  • _validate_sync_type (643-645)
kairon/api/app/routers/bot/integrations.py (6)
kairon/events/definitions/catalog_sync.py (3)
  • CatalogSync (13-73)
  • validate (33-42)
  • enqueue (44-61)
kairon/shared/auth.py (2)
  • Authentication (30-465)
  • get_current_user_and_bot (83-109)
kairon/shared/catalog_sync/data_objects.py (1)
  • CatalogSyncLogs (9-24)
kairon/shared/cognition/processor.py (1)
  • CognitionDataProcessor (26-1014)
kairon/shared/constants.py (1)
  • CatalogProvider (127-128)
kairon/shared/models.py (2)
  • User (68-92)
  • get_user (84-87)
🪛 Ruff (0.8.2)
tests/unit_test/data_processor/data_processor_test.py

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


1961-1961: f-string without any placeholders

Remove extraneous f prefix

(F541)

kairon/shared/cognition/processor.py

644-644: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

kairon/api/app/routers/bot/data.py

379-379: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

kairon/api/app/routers/bot/integrations.py

22-23: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


24-24: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


25-25: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


26-26: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


54-55: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


56-56: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


57-57: Do not perform function call Path in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


58-58: Do not perform function call Security in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Python CI
  • GitHub Check: Analyze (python)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (6)
tests/unit_test/data_processor/data_processor_test.py (6)

1096-1223: Test implementation looks robust with proper data setup and cleanup.

The test_preprocess_push_menu_data_success test successfully:

  1. Sets up the required provider mappings and configuration
  2. Loads test data from a JSON file
  3. Creates fallback image data
  4. Verifies the preprocessing transforms data correctly
  5. Cleans up all test data afterward

This pattern is followed throughout the tests in this file, showing good testing practices with proper setup, verification, and cleanup.


1234-1286: Test cases properly handle error scenarios.

Both test_preprocess_item_toggle_data_success and test_preprocess_item_toggle_data_no_provider_mapping appropriately test both the happy path and the error case when provider mappings are missing, ensuring the code handles exceptions correctly.


1287-1412: Complete image resolution test coverage.

The image resolution tests cover all three scenarios:

  1. Resolving from a global fallback image
  2. Preferentially using a local image when available
  3. Handling the error case when no image is found

This ensures robust handling of image resolution for catalog items.


1414-1516: Good configuration management testing.

The bot sync configuration tests verify:

  1. Adding a new configuration works correctly
  2. Existing configurations aren't overwritten
  3. Restaurant and branch name retrieval functions properly
  4. Errors are raised when configurations don't exist

This comprehensive testing ensures the configuration management functionality is reliable.


2682-2691: Good testing of validation functions.

The tests for _validate_sync_type verify both success and failure cases:

  1. Valid sync types are accepted
  2. Invalid sync types raise the expected exception

These tests ensure the validation logic works correctly before it's used in more complex operations.


1800-1800: Parameter renaming consistently applied throughout the tests.

The parameter renaming from event_type to sync_type has been consistently applied across all test methods, aligning with the model changes mentioned in the PR summary.

Also applies to: 1849-1849, 1862-1862, 1943-1943, 1957-1957, 1975-1975, 1993-1993, 2023-2023, 2033-2033, 2060-2060, 2075-2075, 2102-2102, 2117-2117, 2160-2160, 2182-2182, 2244-2244, 2283-2283, 2361-2361, 2399-2399, 2457-2457

Comment on lines +386 to +388
raise AppException("Invalid Provider")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect Enum membership check – always evaluates to False

CatalogSyncClass.__members__.values() returns the Enum members, not their
.value strings, while request_data.provider is a plain string.
Use the _value2member_map_ helper or compare against a comprehension.

-if request_data.provider not in CatalogSyncClass.__members__.values():
+if request_data.provider not in CatalogSyncClass._value2member_map_:
     raise AppException("Invalid Provider")

Without this fix every valid provider string will be rejected.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if request_data.provider not in CatalogSyncClass.__members__.values():
raise AppException("Invalid Provider")
if request_data.provider not in CatalogSyncClass._value2member_map_:
raise AppException("Invalid Provider")

Comment on lines +175 to +177
except Exception as e:
print(str(e))
execution_id = CatalogSyncLogProcessor.get_execution_id_for_bot(self.bot)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace print with proper logging

Using print() for error reporting doesn't integrate with the application's logging system. Use a logger instead.

-            print(str(e))
+            logger.error(f"Error during catalog sync execution: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
print(str(e))
execution_id = CatalogSyncLogProcessor.get_execution_id_for_bot(self.bot)
except Exception as e:
- print(str(e))
+ logger.error(f"Error during catalog sync execution: {str(e)}")
execution_id = CatalogSyncLogProcessor.get_execution_id_for_bot(self.bot)

Comment on lines +18 to +27
def __init__(self, bot: Text, user: Text, provider: Text, **kwargs):
"""
Initialise event.
"""
self.bot = bot
self.user = user
self.provider = provider
self.token = kwargs.get("token", "")
self.sync_type = kwargs.get("sync_type", SyncType.item_toggle)
self.data = []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Initialize self.data as a dictionary instead of a list

The self.data attribute is initialized as an empty list, but it's accessed as a dictionary in the rest of the code (e.g., self.data.get("kv", []) on line 76).

-        self.data = []
+        self.data = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(self, bot: Text, user: Text, provider: Text, **kwargs):
"""
Initialise event.
"""
self.bot = bot
self.user = user
self.provider = provider
self.token = kwargs.get("token", "")
self.sync_type = kwargs.get("sync_type", SyncType.item_toggle)
self.data = []
def __init__(self, bot: Text, user: Text, provider: Text, **kwargs):
"""
Initialise event.
"""
self.bot = bot
self.user = user
self.provider = provider
self.token = kwargs.get("token", "")
self.sync_type = kwargs.get("sync_type", SyncType.item_toggle)
self.data = {}

Comment on lines +116 to +117
stale_primary_keys = kwargs.get("stale_primary_keys")
status = "Failure"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add null check for stale_primary_keys

stale_primary_keys could be None if coming from kwargs, which would cause issues when used in conditionals later.

        stale_primary_keys = kwargs.get("stale_primary_keys")
+        stale_primary_keys = stale_primary_keys or []
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stale_primary_keys = kwargs.get("stale_primary_keys")
status = "Failure"
stale_primary_keys = kwargs.get("stale_primary_keys")
stale_primary_keys = stale_primary_keys or []
status = "Failure"

Comment on lines +56 to +58
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot = self.bot, executionID = execution_id,
sync_status=SYNC_STATUS.VALIDATING_FAILED, message = str(e), first_name = "HG"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace hardcoded email address

The email address is hardcoded. This should be configurable or retrieved from user/bot settings.

-                    mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot = self.bot, executionID = execution_id,
+                    mail_type="catalog_sync_status", email=self.user, bot=self.bot, executionID=execution_id,
                     sync_status=SYNC_STATUS.VALIDATING_FAILED, message = str(e), first_name = "HG"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot = self.bot, executionID = execution_id,
sync_status=SYNC_STATUS.VALIDATING_FAILED, message = str(e), first_name = "HG"
)
mail_type="catalog_sync_status", email=self.user, bot=self.bot, executionID=execution_id,
sync_status=SYNC_STATUS.VALIDATING_FAILED, message=str(e), first_name="HG"

Comment on lines +99 to +102
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message=str(e), first_name="HG"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace hardcoded email address

The email address is hardcoded. This should be configurable or retrieved from user/bot settings.

-                mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
+                mail_type="catalog_sync_status", email=self.user, bot=self.bot,
                 executionID=execution_id,
                 sync_status=sync_status, message=str(e), first_name="HG"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message=str(e), first_name="HG"
)
mail_type="catalog_sync_status", email=self.user, bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message=str(e), first_name="HG"
)

Comment on lines +178 to +182
await MailUtility.format_and_send_mail(
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message=str(e), first_name="HG"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace hardcoded email address

The email address is hardcoded. This should be configurable or retrieved from user/bot settings.

-                mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
+                mail_type="catalog_sync_status", email=self.user, bot=self.bot,
                 executionID=execution_id,
                 sync_status=sync_status, message=str(e), first_name="HG"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await MailUtility.format_and_send_mail(
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message=str(e), first_name="HG"
)
await MailUtility.format_and_send_mail(
mail_type="catalog_sync_status", email=self.user, bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message=str(e), first_name="HG"
)

Comment on lines +170 to +173
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message="Catalog has been synced successfully", first_name="HG"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace hardcoded email address

The email address is hardcoded. This should be configurable or retrieved from user/bot settings.

-                mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
+                mail_type="catalog_sync_status", email=self.user, bot=self.bot,
                 executionID=execution_id,
                 sync_status=sync_status, message="Catalog has been synced successfully", first_name="HG"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mail_type="catalog_sync_status", email="himanshu.gupta@nimblework.com", bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message="Catalog has been synced successfully", first_name="HG"
)
mail_type="catalog_sync_status", email=self.user, bot=self.bot,
executionID=execution_id,
sync_status=sync_status, message="Catalog has been synced successfully", first_name="HG"
)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unit_test/data_processor/data_processor_test.py (2)

12-12: Remove unused import.

The urllib.parse.urljoin import is declared but never used in the code.

-from urllib.parse import urljoin
🧰 Tools
🪛 Ruff (0.8.2)

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


1961-1961: Remove unnecessary f-string.

This string doesn't contain any placeholders, so the f prefix is unnecessary.

-        with pytest.raises(AppException, match=f"Sync type does not exist"):
+        with pytest.raises(AppException, match="Sync type does not exist"):
🧰 Tools
🪛 Ruff (0.8.2)

1961-1961: f-string without any placeholders

Remove extraneous f prefix

(F541)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e43c096 and 780a404.

📒 Files selected for processing (1)
  • tests/unit_test/data_processor/data_processor_test.py (28 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/unit_test/data_processor/data_processor_test.py (4)
kairon/shared/catalog_sync/data_objects.py (1)
  • CatalogProviderMapping (29-35)
kairon/shared/data/data_objects.py (1)
  • BotSyncConfig (1108-1119)
kairon/shared/models.py (1)
  • VaultSyncType (120-122)
kairon/shared/cognition/processor.py (8)
  • CognitionDataProcessor (26-1014)
  • get_restaurant_and_branch_name (1008-1014)
  • preprocess_push_menu_data (682-727)
  • preprocess_item_toggle_data (730-747)
  • resolve_image_link (751-775)
  • add_bot_sync_config (988-1005)
  • validate_data (487-576)
  • _validate_sync_type (643-645)
🪛 Ruff (0.8.2)
tests/unit_test/data_processor/data_processor_test.py

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


1961-1961: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Python CI
🔇 Additional comments (6)
tests/unit_test/data_processor/data_processor_test.py (6)

1096-1219: Good test coverage for the push menu preprocessing functionality.

The test thoroughly validates the Petpooja data preprocessing logic, including provider mapping lookups, field transformations, and image resolution. The setup and cleanup are properly handled.


1224-1232: Good error case handling.

The test appropriately verifies that attempting to preprocess data with a non-existent provider mapping raises an exception.


1287-1327: Comprehensive testing of image link resolution.

The test suite covers all scenarios for image link resolution:

  1. Global fallback image
  2. Local item-specific image (takes precedence over global)
  3. Error handling when no images are found

This ensures the catalog sync feature will correctly handle various image availability scenarios.

Also applies to: 1329-1384, 1386-1412


1414-1444: Well-structured bot sync configuration tests.

These tests correctly validate both the initial configuration creation and the handling of existing configurations, ensuring data integrity.

Also applies to: 1446-1486


2267-2273: Good mock assertion for embedding calls.

The test properly verifies that the embedding function is called with the correct parameters, including the input data and metadata.


2503-2512: Good unit tests for sync type validation.

These tests properly check both valid and invalid sync type scenarios, ensuring the validation logic works correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
tests/unit_test/data_processor/catalog_sync_log_processor_test.py (4)

116-118: Use specific exception type instead of generic Exception.

Using generic Exception with pytest.raises() can mask real issues by capturing unexpected exceptions. This was flagged in a previous review.

-        with pytest.raises(Exception):
+        with pytest.raises(AppException):  # Use the appropriate exception type
             CatalogSyncLogProcessor.is_sync_in_progress(bot)
🧰 Tools
🪛 Ruff (0.8.2)

116-116: pytest.raises(Exception) should be considered evil

(B017)


124-134: Use specific exception types and avoid bare except blocks.

Both the bare except block and the generic Exception in pytest.raises() should be replaced with specific exception types. This was flagged in a previous review.

         try:
             bot_settings = BotSettings.objects(bot=bot).get()
             bot_settings.catalog_sync_limit_per_day = 0
-        except:
+        except BotSettings.DoesNotExist:  # Be specific about what exception you're catching
             bot_settings = BotSettings(bot=bot, catalog_sync_limit_per_day=0, user="test")
         bot_settings.save()
-        with pytest.raises(Exception):
+        with pytest.raises(AppException):  # Use the appropriate exception type
             assert CatalogSyncLogProcessor.is_limit_exceeded(bot)
🧰 Tools
🪛 Ruff (0.8.2)

129-129: Do not use bare except

(E722)


132-132: pytest.raises(Exception) should be considered evil

(B017)


246-247: Use specific exception type instead of generic Exception.

Using generic Exception with pytest.raises() can mask unexpected errors. This was flagged in a previous review.

-        with pytest.raises(Exception):
+        with pytest.raises(AppException):  # Or the specific exception type
             CatalogSyncLogProcessor.validate_item_ids(push_menu_payload)
🧰 Tools
🪛 Ruff (0.8.2)

246-246: pytest.raises(Exception) should be considered evil

(B017)


330-331: Use specific exception type instead of generic Exception.

Using generic Exception with pytest.raises() can mask unexpected errors. This was flagged in a previous review.

-        with pytest.raises(Exception, match="Push menu processing is disabled for this bot"):
+        with pytest.raises(AppException, match="Push menu processing is disabled for this bot"):  # Use the appropriate exception type
             CatalogSyncLogProcessor.is_sync_type_allowed(bot, SyncType.push_menu)
🧹 Nitpick comments (7)
tests/unit_test/data_processor/catalog_sync_log_processor_test.py (6)

7-7: Remove unused import.

The uni_escape_re from babel.messages.jslexer is imported but not used anywhere in the code.

-from babel.messages.jslexer import uni_escape_re
🧰 Tools
🪛 Ruff (0.8.2)

7-7: babel.messages.jslexer.uni_escape_re imported but unused

Remove unused import: babel.messages.jslexer.uni_escape_re

(F401)


11-11: Remove unused import.

AppException is imported but never used in this file.

-from kairon.exceptions import AppException
🧰 Tools
🪛 Ruff (0.8.2)

11-11: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


15-15: Remove unused import.

CognitionDataProcessor is imported but never used in this file.

-from kairon.shared.cognition.processor import CognitionDataProcessor
🧰 Tools
🪛 Ruff (0.8.2)

15-15: kairon.shared.cognition.processor.CognitionDataProcessor imported but unused

Remove unused import: kairon.shared.cognition.processor.CognitionDataProcessor

(F401)


24-24: Use uppercase for environment variable names.

Environment variables should follow the convention of using uppercase names.

-        os.environ["system_file"] = "./tests/testing_data/system.yaml"
+        os.environ["SYSTEM_FILE"] = "./tests/testing_data/system.yaml"

Make sure to update any code that references this environment variable elsewhere.

🧰 Tools
🪛 Ruff (0.8.2)

24-24: Use capitalized environment variable SYSTEM_FILE instead of system_file

Replace system_file with SYSTEM_FILE

(SIM112)


232-237: Handle exceptions properly and avoid unused variables.

The variable e is assigned but never used. Consider either logging the exception or using the exception message in your assertion.

         try:
             CatalogSyncLogProcessor.validate_item_ids(push_menu_payload)
             itemid_missing = False
-        except Exception as e:
+        except Exception:  # If you're not using the exception, don't capture it
             itemid_missing = True

Better yet, use pytest's assertion mechanism for more explicit testing:

-        try:
-            CatalogSyncLogProcessor.validate_item_ids(push_menu_payload)
-            itemid_missing = False
-        except Exception as e:
-            itemid_missing = True
-
-        assert itemid_missing is False
+        # This doesn't raise an exception when valid
+        CatalogSyncLogProcessor.validate_item_ids(push_menu_payload)
🧰 Tools
🪛 Ruff (0.8.2)

235-235: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


241-247: Add a test for the exact exception message when validating item IDs.

The test verifies that an exception is raised, but doesn't check the exception message. Adding a match parameter would make the test more precise.

         with push_menu_payload_path.open("r", encoding="utf-8") as f:
             push_menu_payload = json.load(f)

-        with pytest.raises(Exception):
+        with pytest.raises(AppException, match="itemID is missing for one or more items"):  # Match the expected error message
             CatalogSyncLogProcessor.validate_item_ids(push_menu_payload)
🧰 Tools
🪛 Ruff (0.8.2)

246-246: pytest.raises(Exception) should be considered evil

(B017)

tests/unit_test/data_processor/data_processor_test.py (1)

1155-1220: Make equality assertion order-agnostic

assert result == expected_result will fail when the processor returns the same items in a different order (e.g., when a DB query or dict iteration order changes).
Compare the lists after sorting by id instead, keeping the test robust to non-deterministic ordering.

-        assert result == expected_result
+        # Order-independent comparison
+        assert sorted(result["meta"], key=lambda x: x["id"]) == \
+               sorted(expected_result["meta"], key=lambda x: x["id"])
+        assert sorted(result["kv"], key=lambda x: x["id"]) == \
+               sorted(expected_result["kv"], key=lambda x: x["id"])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 780a404 and a72c552.

📒 Files selected for processing (2)
  • tests/unit_test/data_processor/catalog_sync_log_processor_test.py (1 hunks)
  • tests/unit_test/data_processor/data_processor_test.py (29 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/unit_test/data_processor/catalog_sync_log_processor_test.py

7-7: babel.messages.jslexer.uni_escape_re imported but unused

Remove unused import: babel.messages.jslexer.uni_escape_re

(F401)


11-11: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


15-15: kairon.shared.cognition.processor.CognitionDataProcessor imported but unused

Remove unused import: kairon.shared.cognition.processor.CognitionDataProcessor

(F401)


24-24: Use capitalized environment variable SYSTEM_FILE instead of system_file

Replace system_file with SYSTEM_FILE

(SIM112)


116-116: pytest.raises(Exception) should be considered evil

(B017)


129-129: Do not use bare except

(E722)


132-132: pytest.raises(Exception) should be considered evil

(B017)


235-235: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


246-246: pytest.raises(Exception) should be considered evil

(B017)

tests/unit_test/data_processor/data_processor_test.py

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


1961-1961: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Analyze (python)
  • GitHub Check: Python CI
🔇 Additional comments (1)
tests/unit_test/data_processor/catalog_sync_log_processor_test.py (1)

1-572: Comprehensive test suite for catalog sync functionality.

This is a well-structured and thorough test suite that covers all the critical functionality of the CatalogSyncLogProcessor. The tests verify log creation, status tracking, validation logic, configuration checking, and error handling - all essential for a robust catalog synchronization system.

I especially appreciate:

  • Setting up and tearing down test data for each test case
  • Testing both successful and error conditions
  • Validating specific error messages
  • Testing integration with dependent components like BotSettings and CognitionSchema

Good test coverage lays a solid foundation for the catalog integration between Petpooja (POS) and Meta as outlined in the PR objectives.

🧰 Tools
🪛 Ruff (0.8.2)

7-7: babel.messages.jslexer.uni_escape_re imported but unused

Remove unused import: babel.messages.jslexer.uni_escape_re

(F401)


11-11: kairon.exceptions.AppException imported but unused

Remove unused import: kairon.exceptions.AppException

(F401)


15-15: kairon.shared.cognition.processor.CognitionDataProcessor imported but unused

Remove unused import: kairon.shared.cognition.processor.CognitionDataProcessor

(F401)


24-24: Use capitalized environment variable SYSTEM_FILE instead of system_file

Replace system_file with SYSTEM_FILE

(SIM112)


116-116: pytest.raises(Exception) should be considered evil

(B017)


129-129: Do not use bare except

(E722)


132-132: pytest.raises(Exception) should be considered evil

(B017)


235-235: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


246-246: pytest.raises(Exception) should be considered evil

(B017)

Comment on lines +10 to 12
from pathlib import Path
from typing import List
from urllib.parse import urljoin
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import to satisfy Ruff F401

urljoin is imported but never used anywhere in the test module. Ruff flags this as F401, which will fail lint-only CI stages.

-from urllib.parse import urljoin
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from pathlib import Path
from typing import List
from urllib.parse import urljoin
from pathlib import Path
from typing import List
🧰 Tools
🪛 Ruff (0.8.2)

12-12: urllib.parse.urljoin imported but unused

Remove unused import: urllib.parse.urljoin

(F401)


processor = CognitionDataProcessor()

with pytest.raises(AppException, match=f"Event type does not exist"):
with pytest.raises(AppException, match=f"Sync type does not exist"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove extraneous f-string prefix

There are no interpolations in the string, so the f prefix is unnecessary and triggers Ruff F541.

-with pytest.raises(AppException, match=f"Sync type does not exist"):
+with pytest.raises(AppException, match="Sync type does not exist"):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with pytest.raises(AppException, match=f"Sync type does not exist"):
with pytest.raises(AppException, match="Sync type does not exist"):
🧰 Tools
🪛 Ruff (0.8.2)

1961-1961: f-string without any placeholders

Remove extraneous f prefix

(F541)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant