Skip to content

Commit dfec6f7

Browse files
Modified Crawling and graceful error handling with streamlined UI. (#48)
Patch Fixes: - Fixed MCP Docker Build Failure: Resolved the build error for the mcp service by removing the invalid readme reference in fast-markdown-mcp/pyproject.toml. - Refactored File Handling (Removed In-Memory Storage): - Investigated the complex in-memory file handling mechanism and its inconsistencies. - Removed the in-memory storage logic from backend/app/crawler.py. - Removed the associated API endpoints (/api/memory-files, /api/memory-files/{file_id}) from backend/app/main.py. - Added a new backend API endpoint (/api/storage/file-content) to read files directly from the storage/markdown directory. - Deleted the old frontend API proxy route (app/api/memory-file/route.ts). - Created a new frontend API proxy route (app/api/storage/file-content/route.ts). - Updated frontend components (StoredFiles.tsx, DiscoveredFiles.tsx) to use the new API route for downloading file content. - Documentation: Created markdown plans for the MCP build fix and the in-memory feature removal. - This simplifies the architecture by relying solely on disk-based consolidated files in storage/markdown. Please remember to test the file download functionality after restarting the services. feat: Enhance crawl workflow, UI, and fix backend issues This commit addresses several issues and implements enhancements across the crawling workflow: Fixes: - Resolved 400 Bad Request error caused by incorrect query parameter (`file_path`) in the file content API route. - Fixed backend `NameError` (`set_task_context`) in crawler.py that prevented result file saving. - Corrected 500 Internal Server Error caused by Docker networking issue (localhost vs. service name) in the file content API route proxy. - Ensured 'Data Extracted' statistic is correctly saved in the backend status and displayed in the UI. UI Enhancements: - Made "Consolidated Files" section persistent, rendering as soon as a job ID is available. - Relocated "Crawl Selected" button inline with status details. - Updated "Crawl Selected" button to show dynamic count and disable appropriately. - Renamed "Job Status" section title to "Discovered Pages". - Renamed "Processing Summary" section title to "Statistics". - Removed the unused "Extracted Content" display section. Backend Enhancements: - Implemented file appending logic in crawler.py for consolidated `.md` and `.json` files. Subsequent crawls for the same job now append data and update timestamps instead of overwriting. feat(frontend): Update Consolidated Files component for polling and downloads - Implements polling every 10 seconds in ConsolidatedFiles.tsx to automatically refresh the list of files from the /api/storage endpoint, ensuring newly added files appear in the UI. - Modifies the MD and JSON icon links to point to the /api/storage/download endpoint and adds the 'download' attribute, triggering file downloads instead of opening content in the browser.
1 parent 88b0517 commit dfec6f7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+4920
-1271
lines changed

.roo/rules-boomerang/rules.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Special Rules for Critiquing plans and strategies:
2+
3+
Once you have created a subtask markdown file or in memory, you are to get a second opinion from the mode called "Expert Opinion" This mode is designed to only accept your subtask plans and strategies before you present it to the user to approve. You are to do this before you deligate to either Code or Debug modes and after you have created a subtask. Consider this mode as your personal brainstorm. Argue with it from ground truth about the codebase, both you and the Expert Opinion mode as complete knowledge of the codebase. I want you to counter the points and come to a common understanding for the most accurate and right path moving forward and then present those findings to the user.

.roomodes

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"customModes": [
3+
{
4+
"slug": "boomerang",
5+
"name": "boomerang ",
6+
"roleDefinition": "You are Roo, a strategic workflow orchestrator who coordinates complex tasks by delegating them to appropriate specialized modes. You have a comprehensive understanding of each mode's capabilities and limitations, allowing you to effectively break down complex problems into discrete tasks that can be solved by different specialists.\n\nProcedure to follow is to ask the coder for an implementation plan without writing any code yet, forward the implementation plan to the Expert Opinion mode for review, and based the feedback from Expert Opinion mode give the coder the go-ahead to create a updated task list and then ask for user to approve the task list which has the feedbac of Expert Opinion mode and previous plans with pros and cons for each. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.",
7+
"customInstructions": "Your role is to coordinate complex workflows by delegating tasks to specialized modes. As an orchestrator, you should:\n\n1. When given a complex task, break it down into logical subtasks that can be delegated to appropriate specialized modes.\n\n2. For each subtask, use the `new_task` tool to delegate. Choose the most appropriate mode for the subtask's specific goal and provide comprehensive instructions in the `message` parameter. These instructions must include:\n * All necessary context from the parent task or previous subtasks required to complete the work.\n * A clearly defined scope, specifying exactly what the subtask should accomplish.\n * An explicit statement that the subtask should *only* perform the work outlined in these instructions and not deviate.\n * An instruction for the subtask to signal completion by using the `attempt_completion` tool, providing a concise yet thorough summary of the outcome in the `result` parameter, keeping in mind that this summary will be the source of truth used to keep track of what was completed on this project.\n * A statement that these specific instructions supersede any conflicting general instructions the subtask's mode might have.\n * Once you have the plan created by the coder forward the implementation plan to the Expert Opinion mode for review, and based on the result ask for improvements or give the coder the go-ahead. Your prompts for those roles will have to ensure they know the procedure too, so they’re all singing from the same hymn-sheet.\n\n3. Track and manage the progress of all subtasks in a markdown file in the codebase. If its a bug then start the heading with BUG: if its a feature then write FEATURE:. When a subtask is completed, analyze its results from the user and determine the next steps and then go back to complete the markdown file subtask. \n\n4. Help the user understand how the different subtasks fit together in the overall workflow. Provide clear reasoning about why you're delegating specific tasks to specific modes.\n\n5. When all subtasks are completed, synthesize the results and provide a comprehensive overview of what was accomplished.\n\n6. Always Ask clarifying questions when necessary to better understand how to break down complex tasks effectively in as much detail as possible. \n\n7. Suggest improvements to the workflow based on the results of completed subtasks.\n\nUse subtasks to maintain clarity. If a request significantly shifts focus or requires a different expertise (mode), consider creating a subtask rather than overloading the current one.",
8+
"groups": [
9+
"read",
10+
"edit",
11+
"command",
12+
"mcp"
13+
],
14+
"source": "project"
15+
}
16+
]
17+
}

README.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# DevDocs by CyberAGI 🚀
22

33
<div align="center">
4-
<img src="https://github.com/user-attachments/assets/6d4cc4df-fe5d-4483-9218-3d621f572e49" alt="DevDocs Interface" width="800">
5-
<img src="https://github.com/user-attachments/assets/00350dc6-2ff3-40cf-b0b3-8b3e387d983d" alt="DevDocs Interface" width="800">
4+
<img src="assets/image.png" alt="DevDocs Interface" width="800">
5+
66

77
<p align="center">
88
<strong>Turn Weeks of Documentation Research into Hours of Productive Development</strong>
@@ -108,18 +108,43 @@ git clone https://github.com/cyberagiinc/DevDocs.git
108108
# Navigate to the project directory
109109
cd DevDocs
110110

111+
# Configure environment variables
112+
# Copy the template file to .env
113+
cp .env.template .env
114+
115+
# Ensure NEXT_PUBLIC_BACKEND_URL in .env is set correctly (e.g., http://localhost:24125)
116+
# This allows the frontend (running in your browser) to communicate with the backend service.
117+
118+
111119
# Start all services using Docker
112120
./docker-start.sh
113121
```
114122

115-
For Windows users:
123+
For Windows users: Experimental Only (Not Tested Yet)
116124
```cmd
117125
# Clone the repository
118126
git clone https://github.com/cyberagiinc/DevDocs.git
119127
120128
# Navigate to the project directory
129+
121130
cd DevDocs
122131
132+
# Configure environment variables
133+
# Copy the template file to .env
134+
135+
copy .env.template .env
136+
137+
# Ensure NEXT_PUBLIC_BACKEND_URL in .env is set correctly (e.g., http://localhost:24125)
138+
139+
# This allows the frontend (running in your browser) to communicate with the backend service.
140+
141+
# Prerequisites: Install WSL 2 and Docker Desktop
142+
# Docker Desktop for Windows requires WSL 2. Please ensure you have WSL 2 installed and running first.
143+
# 1. Install WSL 2: Follow the official Microsoft guide: https://learn.microsoft.com/en-us/windows/wsl/install
144+
# 2. Install Docker Desktop for Windows: Download and install from the official Docker website. Docker Desktop includes Docker Compose.
145+
146+
147+
123148
# Start all services using Docker
124149
docker-start.bat
125150
```

app/api/all-files/route.ts

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,7 @@ export async function GET(request: Request) {
9090
// Get in-memory files from the backend
9191
let memoryFiles = []
9292
try {
93-
const backendUrl = process.env.NEXT_PUBLIC_BACKEND_URL || process.env.BACKEND_URL || 'http://localhost:24125'
94-
const memoryResponse = await fetch(`${backendUrl}/api/memory-files`)
93+
const memoryResponse = await fetch('http://backend:24125/api/memory-files')
9594
if (memoryResponse.ok) {
9695
const memoryData = await memoryResponse.json()
9796
if (memoryData.success && Array.isArray(memoryData.files)) {

app/api/memory-file/route.ts

Lines changed: 0 additions & 48 deletions
This file was deleted.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
import { NextResponse } from 'next/server'
2+
import { type NextRequest } from 'next/server'
3+
4+
// Determine backend URL (consider containerized vs. local development)
5+
// In Docker, use the service name 'backend'. Locally, use localhost.
6+
// Force using the service name for Docker context, as env var might not be reliable here.
7+
const backendHost = 'http://backend:24125';
8+
9+
export async function GET(request: NextRequest) {
10+
try {
11+
const searchParams = request.nextUrl.searchParams;
12+
const filePath = searchParams.get('file_path'); // Expecting file_path relative to storage/markdown
13+
14+
if (!filePath) {
15+
return NextResponse.json(
16+
{ success: false, error: 'Missing file path parameter' },
17+
{ status: 400 }
18+
);
19+
}
20+
21+
// Basic validation/sanitization (more robust checks might be needed depending on usage)
22+
if (filePath.includes('..')) {
23+
return NextResponse.json(
24+
{ success: false, error: 'Invalid file path' },
25+
{ status: 400 }
26+
);
27+
}
28+
29+
// Construct the backend URL
30+
// The backend endpoint expects 'file_path' query parameter
31+
const backendUrl = new URL(`${backendHost}/api/storage/file-content`);
32+
backendUrl.searchParams.append('file_path', filePath);
33+
34+
console.log(`Fetching from backend: ${backendUrl.toString()}`); // Log the backend URL being called
35+
36+
// Fetch the file content from the backend
37+
const response = await fetch(backendUrl.toString());
38+
39+
if (!response.ok) {
40+
let errorData;
41+
try {
42+
// Try to parse JSON error from backend first
43+
errorData = await response.json();
44+
} catch (parseError) {
45+
// If backend didn't send JSON, use text
46+
errorData = { error: await response.text() };
47+
}
48+
console.error(`Backend fetch failed (${response.status}):`, errorData.error);
49+
return NextResponse.json(
50+
{ success: false, error: errorData.error || `Failed to fetch file content (Status: ${response.status})` },
51+
{ status: response.status }
52+
);
53+
}
54+
55+
// Get content as text (backend returns PlainTextResponse)
56+
const content = await response.text();
57+
58+
// Return content directly (assuming frontend handles rendering)
59+
// Set appropriate content type if needed, e.g., 'text/markdown'
60+
return new NextResponse(content, {
61+
status: 200,
62+
headers: {
63+
'Content-Type': filePath.endsWith('.json') ? 'application/json' : 'text/plain; charset=utf-8', // Adjust content type based on file extension
64+
},
65+
});
66+
67+
} catch (error) {
68+
console.error('Error fetching storage file content:', error);
69+
return NextResponse.json(
70+
{ success: false, error: error instanceof Error ? error.message : 'Failed to fetch file content' },
71+
{ status: 500 }
72+
);
73+
}
74+
}

app/api/storage/route.ts

Lines changed: 21 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,23 @@ export async function GET(request: Request) {
4141
const mdFiles = files.filter(f => f.endsWith('.md'))
4242
const jsonFiles = files.filter(f => f.endsWith('.json'))
4343

44+
// Define interface for disk file details
45+
interface DiskFileDetail {
46+
name: string;
47+
jsonPath: string;
48+
markdownPath: string;
49+
timestamp: Date;
50+
size: number;
51+
wordCount: number;
52+
charCount: number;
53+
isConsolidated: boolean;
54+
pagesCount: number;
55+
rootUrl: string;
56+
isInMemory: boolean;
57+
}
58+
4459
// Get disk files
45-
const diskFileDetails = await Promise.all(
60+
const diskFileDetails: DiskFileDetail[] = await Promise.all(
4661
mdFiles.map(async (filename) => {
4762
const mdPath = path.join(STORAGE_DIR, filename)
4863
const jsonPath = path.join(STORAGE_DIR, filename.replace('.md', '.json'))
@@ -162,42 +177,13 @@ export async function GET(request: Request) {
162177
metadata?: any;
163178
}
164179

165-
// Get in-memory files from the backend
166-
let memoryFiles = []
167-
try {
168-
const backendUrl = process.env.NEXT_PUBLIC_BACKEND_URL || process.env.BACKEND_URL || 'http://localhost:24125'
169-
const memoryResponse = await fetch(`${backendUrl}/api/memory-files`)
170-
if (memoryResponse.ok) {
171-
const memoryData = await memoryResponse.json()
172-
if (memoryData.success && Array.isArray(memoryData.files)) {
173-
// Convert in-memory files to the same format as disk files
174-
memoryFiles = memoryData.files
175-
.filter((file: MemoryFile) => !file.isJson) // Only include markdown files
176-
.map((file: MemoryFile) => ({
177-
name: file.name,
178-
jsonPath: file.path.replace('.md', '.json'),
179-
markdownPath: file.path,
180-
timestamp: new Date(file.timestamp),
181-
size: file.size,
182-
wordCount: file.wordCount,
183-
charCount: file.charCount,
184-
isConsolidated: false,
185-
pagesCount: 1,
186-
rootUrl: '',
187-
isInMemory: true
188-
}))
189-
}
190-
}
191-
} catch (e) {
192-
console.error('Error fetching in-memory files:', e)
193-
}
194-
195-
// Combine disk and memory files
196-
const allFiles = [...diskFileDetails, ...memoryFiles]
197-
180+
// Removed fetching and combining of in-memory files as that feature was removed.
181+
// We now only work with files read from disk.
182+
const allFiles = diskFileDetails // Keep variable name for minimal diff, even though it's just disk files now
183+
198184
// Filter out individual files (non-consolidated files)
199185
// Only show consolidated files in the Stored Files section
200-
const consolidatedFiles = allFiles.filter(file => file.isConsolidated)
186+
const consolidatedFiles = allFiles.filter((file: DiskFileDetail) => file.isConsolidated)
201187

202188
// Additional filter to exclude files with UUID-like names
203189
// UUID pattern: 8-4-4-4-12 hex digits (e.g., 095104d8-8e90-48f0-8670-9e45c914f115)

0 commit comments

Comments
 (0)