Skip to content

Conversation

@Shubham-Khichi
Copy link
Contributor

No description provided.

Fixed MCP Docker Build Failure: Resolved the build error for the mcp service by removing the invalid readme reference in fast-markdown-mcp/pyproject.toml.
Refactored File Handling (Removed In-Memory Storage):
Investigated the complex in-memory file handling mechanism and its inconsistencies.
Removed the in-memory storage logic from backend/app/crawler.py.
Removed the associated API endpoints (/api/memory-files, /api/memory-files/{file_id}) from backend/app/main.py.
Added a new backend API endpoint (/api/storage/file-content) to read files directly from the storage/markdown directory.
Deleted the old frontend API proxy route (app/api/memory-file/route.ts).
Created a new frontend API proxy route (app/api/storage/file-content/route.ts).
Updated frontend components (StoredFiles.tsx, DiscoveredFiles.tsx) to use the new API route for downloading file content.
Documentation: Created markdown plans for the MCP build fix and the in-memory feature removal.
This simplifies the architecture by relying solely on disk-based consolidated files in storage/markdown. Please remember to test the file download functionality after restarting the services.
This commit addresses several issues and implements enhancements across the crawling workflow:

Fixes:
- Resolved 400 Bad Request error caused by incorrect query parameter (`file_path`) in the file content API route.
- Fixed backend `NameError` (`set_task_context`) in crawler.py that prevented result file saving.
- Corrected 500 Internal Server Error caused by Docker networking issue (localhost vs. service name) in the file content API route proxy.
- Ensured 'Data Extracted' statistic is correctly saved in the backend status and displayed in the UI.

UI Enhancements:
- Made "Consolidated Files" section persistent, rendering as soon as a job ID is available.
- Relocated "Crawl Selected" button inline with status details.
- Updated "Crawl Selected" button to show dynamic count and disable appropriately.
- Renamed "Job Status" section title to "Discovered Pages".
- Renamed "Processing Summary" section title to "Statistics".
- Removed the unused "Extracted Content" display section.

Backend Enhancements:
- Implemented file appending logic in crawler.py for consolidated `.md` and `.json` files. Subsequent crawls for the same job now append data and update timestamps instead of overwriting.

Changelog:

### Added
- Backend logic to append new crawl results to existing consolidated `.md` and `.json` files for the same job ID.
- Dynamic count display to "Crawl Selected" button.

### Changed
- "Consolidated Files" section now appears persistently once a job is initiated.
- "Crawl Selected" button relocated inline with status details and disables after initiating crawl.
- Renamed "Job Status" section title to "Discovered Pages".
- Renamed "Processing Summary" section title to "Statistics".
- Updated backend status management to correctly store and transmit the 'Data Extracted' statistic.

### Fixed
- Resolved 400 Bad Request error when fetching file content due to incorrect query parameter name.
- Fixed backend `NameError` in crawler that prevented saving crawl results.
- Resolved 500 Internal Server Error when fetching `.json` file content due to Docker networking issue in API proxy route.
- Corrected display issue where 'Data Extracted' statistic showed "N/A" instead of the actual value.

### Removed
- Removed the unused "Extracted Content" display section from the UI.
feat(frontend): Update Consolidated Files component for polling and downloads

- Implements polling every 10 seconds in ConsolidatedFiles.tsx to automatically refresh the list of files from the /api/storage endpoint, ensuring newly added files appear in the UI.
- Modifies the MD and JSON icon links to point to the /api/storage/download endpoint and adds the 'download' attribute, triggering file downloads instead of opening content in the browser.
Introduces a new `CrawlUrls` component to display and manage discovered URLs during a crawl job. This component utilizes Shadcn UI elements (Table, Checkbox, Badge, Tooltip) to provide a detailed view of individual URL statuses, handle URL selection for targeted actions, and display status updates driven by polling managed in `app/page.tsx`.

Key changes include:
- Creation of the `CrawlUrls` component for URL list display and interaction.
- Refactoring of `CrawlStatusMonitor` to focus solely on displaying the overall job status within a Dialog component.
- Updates to `app/page.tsx` to manage essential state (job ID, job status, selected URLs) and orchestrate the polling mechanism for fetching URL-specific status updates.
- Fixed UI bugs where status icons were not updating correctly and checkbox selection state was inconsistent.
- Adjusted the styling of the info icon button for better contrast as per user feedback.

These frontend enhancements align with the ongoing backend redesign, supporting the new job-based status management and polling architecture for more granular progress tracking.

Updated documentation in `docs/features/` (adjust_info_button_style_plan.md, fix_discovered_pages_ui_bugs.md, create_crawl_urls_component_plan.md, crawl_status_monitoring_plan.md) to reflect the completion of related tasks.
@coderabbitai
Copy link

coderabbitai bot commented Apr 11, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary or @cyberagi-AI summarize to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @cyberagi-AI anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants