Skip to content

Fix: Resolve Wiki Structure Timeout Issues for Complex Repositories #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

DeepWiki-Open is a full-stack AI-powered documentation generator that automatically creates interactive wikis for GitHub, GitLab, and Bitbucket repositories. It uses Next.js 15 frontend with FastAPI Python backend to analyze code structure and generate comprehensive documentation with visual diagrams.

## Development Commands

### Frontend (Next.js)
```bash
npm install # Install dependencies
npm run dev # Start dev server with Turbopack on port 3000
npm run build # Build production application
npm run start # Start production server
npm run lint # Run ESLint code quality checks
```

### Backend (Python FastAPI)
```bash
pip install -r api/requirements.txt # Install Python dependencies
python -m api.main # Start FastAPI server on port 8001
pytest # Run test suite
pytest -m unit # Run unit tests only
pytest -m integration # Run integration tests only
```

### Full Stack Development
```bash
# Terminal 1: Start backend
python -m api.main

# Terminal 2: Start frontend
npm run dev

# Or use Docker Compose for full environment
docker-compose up
```

## Architecture Overview

### Frontend Structure (`src/`)
- **App Router**: Modern Next.js routing with server/client components in `src/app/`
- **Dynamic Routes**: Repository pages at `[owner]/[repo]/` with server-side rendering
- **Component Library**: Reusable UI components in `src/components/`
- **Context System**: Global state management for language, theme, and processed projects
- **Internationalization**: 8 language support via next-intl with files in `src/messages/`
- **TypeScript**: Full type safety with definitions in `src/types/`

### Backend Structure (`api/`)
- **FastAPI Application**: Main server in `api/main.py` and routes in `api/api.py`
- **Multi-Provider AI**: Supports Google Gemini, OpenAI, OpenRouter, Azure OpenAI, and Ollama
- **RAG System**: Vector embeddings with FAISS in `api/rag.py`
- **WebSocket Streaming**: Real-time AI responses via `api/websocket_wiki.py`
- **Configuration-Driven**: JSON configs in `api/config/` for models, embeddings, and repo processing

### Key Architectural Patterns
- **Provider Pattern**: Multiple AI model providers with unified interface
- **RAG Implementation**: Retrieval Augmented Generation for repository Q&A
- **Streaming Responses**: WebSocket-based real-time AI output
- **Configuration-Driven**: JSON-based model and provider configuration
- **Component-Based UI**: Modular React components with TypeScript

## Environment Configuration

### Required Environment Variables
```bash
# At minimum, need one AI provider API key
GOOGLE_API_KEY=your_google_api_key # For Google Gemini models
OPENAI_API_KEY=your_openai_api_key # For OpenAI models (also used for embeddings)
OPENROUTER_API_KEY=your_openrouter_api_key # For OpenRouter models

# Azure OpenAI (optional)
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint
AZURE_OPENAI_VERSION=your_azure_openai_version

# Server configuration
PORT=8001 # Backend port (default: 8001)
SERVER_BASE_URL=http://localhost:8001 # Backend URL for frontend API calls

# Optional features
OLLAMA_HOST=http://localhost:11434 # For local Ollama models
DEEPWIKI_AUTH_MODE=true # Enable authorization mode
DEEPWIKI_AUTH_CODE=your_secret_code # Required when auth mode enabled
LOG_LEVEL=INFO # Logging level (DEBUG, INFO, WARNING, ERROR)
LOG_FILE_PATH=api/logs/application.log # Log file location
```

## Development Workflows

### Adding New AI Providers
1. Create client in `api/{provider}_client.py` following existing patterns
2. Update `api/config/generator.json` with provider configuration
3. Add provider selection in frontend components
4. Update environment variable documentation

### Frontend Component Development
- Follow existing component patterns in `src/components/`
- Use TypeScript interfaces from `src/types/`
- Implement internationalization with next-intl
- Support both light/dark themes via next-themes
- Use Tailwind CSS for styling consistency

### Backend API Development
- Follow FastAPI patterns in `api/api.py`
- Use Pydantic models for request/response validation
- Implement proper error handling and logging
- Add WebSocket support for streaming responses when needed

### Configuration Management
- Model configurations: `api/config/generator.json`
- Embedding settings: `api/config/embedder.json`
- Repository processing: `api/config/repo.json`
- Custom config directory via `DEEPWIKI_CONFIG_DIR` environment variable

## Key Features Implementation

### Repository Processing Pipeline
1. Repository validation and cloning
2. Code structure analysis and file filtering
3. Embedding generation using FAISS vector storage
4. AI-powered documentation generation with provider selection
5. Mermaid diagram creation for visualization
6. Wiki structure organization and caching

### Multi-Language Support
- Language detection and switching via `src/contexts/LanguageContext.tsx`
- Translation files in `src/messages/{locale}.json`
- URL-based locale routing in Next.js App Router
- RTL language support preparation

### Real-Time Chat System
- WebSocket connections for streaming AI responses
- RAG-powered repository Q&A with context retrieval
- Conversation history management
- "DeepResearch" mode for multi-turn investigations

## Testing Strategy

### Frontend Testing
- ESLint configuration for code quality in `eslint.config.mjs`
- TypeScript strict mode enabled for type safety
- Component testing patterns (add tests in `__tests__/` directories)

### Backend Testing
- pytest configuration in `pytest.ini`
- Test markers: `unit`, `integration`, `slow`, `network`
- Test files in `test/` directory following `test_*.py` pattern
- Run specific test categories: `pytest -m unit`

## Common Development Patterns

### API Route Proxying
- Next.js rewrites in `next.config.ts` proxy API calls to FastAPI backend
- Frontend makes requests to `/api/*` which are forwarded to backend
- Handles CORS and development/production URL differences

### State Management
- React Context for global state (language, theme, processed projects)
- Local state for component-specific data
- WebSocket state management for real-time features

### Error Handling
- Frontend: Error boundaries and user-friendly error messages
- Backend: FastAPI exception handlers and structured error responses
- Logging: Centralized logging with configurable levels and file output

## Docker Development

### Development Environment
```bash
docker-compose up # Full stack with hot reloading
```

### Production Deployment
```bash
docker build -t deepwiki-open .
docker run -p 8001:8001 -p 3000:3000 -v ~/.adalflow:/root/.adalflow deepwiki-open
```

## Important Notes

- **API Key Security**: Never commit API keys to version control
- **Data Persistence**: Repository clones, embeddings, and caches stored in `~/.adalflow/`
- **Memory Management**: Large repositories may require increased Node.js memory limits
- **Provider Fallbacks**: Implement graceful degradation when AI providers are unavailable
- **Rate Limiting**: Be aware of AI provider rate limits during development
- **WebSocket Connections**: Properly handle connection lifecycle and error states
130 changes: 130 additions & 0 deletions TIMEOUT_FIX_DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Timeout Fix Documentation

## Problem Resolved

Fixed the issue where wiki structure determination was timing out after 5 minutes when the actual processing time needed was 20+ minutes for complex repositories.

## Root Cause

The frontend had hardcoded timeout caps that overrode the backend's dynamic timeout calculations:

1. **Per-page timeout cap**: Limited to 300,000ms (5 minutes) regardless of complexity
2. **Default global timeout**: Defaulted to 300,000ms (5 minutes) when dynamic calculation wasn't available
3. **Maximum threshold**: Extra-large repository threshold was only 900,000ms (15 minutes)

## Changes Made

### 1. Removed Hardcoded Timeout Caps

**File**: `src/app/[owner]/[repo]/page.tsx:1944`

**Before**:
```typescript
const pageTimeout = Math.min(recommendedTimeout / Math.max(complexity.estimated_files / 10, 1), 300000); // Max 5 minutes per page
```

**After**:
```typescript
const maxPageTimeout = parseInt(process.env.NEXT_PUBLIC_MAX_PAGE_TIMEOUT || '900000'); // Default 15 minutes
const pageTimeout = Math.min(safeRecommendedTimeout / Math.max(complexity.estimated_files / 10, 1), maxPageTimeout);
```

### 2. Increased Timeout Thresholds

**File**: `src/app/[owner]/[repo]/page.tsx:1960-1964`

**Before**:
```typescript
thresholds: {
xlarge: 900000 // 15 minutes for extra large repos
}
```

**After**:
```typescript
thresholds: {
xlarge: parseInt(process.env.NEXT_PUBLIC_TIMEOUT_XLARGE || '1800000') // 30 minutes for extra large repos
}
```

### 3. Added Environment Variable Support

**File**: `src/app/[owner]/[repo]/page.tsx:839-840`

**Before**:
```typescript
const globalTimeout = (window as unknown as { deepwikiTimeouts?: { global?: number } }).deepwikiTimeouts?.global || 300000; // Use dynamic timeout or default 5 minutes
```

**After**:
```typescript
const defaultTimeout = parseInt(process.env.NEXT_PUBLIC_DEFAULT_TIMEOUT || '600000'); // Default 10 minutes
const globalTimeout = (window as unknown as { deepwikiTimeouts?: { global?: number } }).deepwikiTimeouts?.global || defaultTimeout;
```

### 4. Added Safety Bounds and Validation

**File**: `src/app/[owner]/[repo]/page.tsx:1945-1951`

```typescript
const maxProcessingTimeout = parseInt(process.env.NEXT_PUBLIC_MAX_PROCESSING_TIMEOUT || '7200000'); // Default 2 hours max
const maxPageTimeout = parseInt(process.env.NEXT_PUBLIC_MAX_PAGE_TIMEOUT || '900000'); // Default 15 minutes
const minTimeout = 300000; // Minimum 5 minutes for safety

// Apply safety bounds to recommended timeout
const safeRecommendedTimeout = Math.max(minTimeout, Math.min(recommendedTimeout, maxProcessingTimeout));
```

## Environment Variables Added

Created `.env.example` with the following configurable timeout options:

```bash
# Maximum global processing timeout (default: 2 hours)
NEXT_PUBLIC_MAX_PROCESSING_TIMEOUT=7200000

# Maximum per-page generation timeout (default: 15 minutes)
NEXT_PUBLIC_MAX_PAGE_TIMEOUT=900000

# Default timeout when complexity analysis fails (default: 10 minutes)
NEXT_PUBLIC_DEFAULT_TIMEOUT=600000

# Repository size-based timeout thresholds
NEXT_PUBLIC_TIMEOUT_SMALL=120000 # 2 minutes
NEXT_PUBLIC_TIMEOUT_MEDIUM=300000 # 5 minutes
NEXT_PUBLIC_TIMEOUT_LARGE=600000 # 10 minutes
NEXT_PUBLIC_TIMEOUT_XLARGE=1800000 # 30 minutes
```

## How It Works Now

1. **Backend Analysis**: The Python backend (`api/data_pipeline.py`) analyzes repository complexity and recommends appropriate timeouts (e.g., 20+ minutes for complex repos)

2. **Frontend Respect**: The frontend now respects these recommendations instead of capping them at 5 minutes

3. **Safety Bounds**: Timeouts are still bounded by configurable maximums to prevent infinite waits:
- Minimum: 5 minutes (safety)
- Maximum: 2 hours (configurable via environment)

4. **Logging**: Added console logging to track timeout adjustments for debugging

## Expected Behavior

- **Complex repositories**: Will now receive timeouts of 20+ minutes as calculated by the backend
- **Simple repositories**: Will continue using shorter timeouts (2-5 minutes)
- **Failed complexity analysis**: Will fallback to 10 minutes instead of 5 minutes
- **Safety**: All timeouts are bounded between 5 minutes and 2 hours

## Testing

- βœ… ESLint passes with only pre-existing warnings
- βœ… Next.js build completes successfully
- βœ… No TypeScript compilation errors
- βœ… Environment variables are properly typed and validated

## Backward Compatibility

All changes are backward compatible:
- Default values maintain reasonable behavior without environment variables
- Existing timeout logic continues to work
- No breaking changes to the API or user interface
12 changes: 4 additions & 8 deletions api/config/generator.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,13 @@
"top_p": 0.8
},
"o1": {
"temperature": 0.7,
"top_p": 0.8
"temperature": 1.0
},
"o3": {
"temperature": 1.0
},
"o4-mini": {
"temperature": 0.7,
"top_p": 0.8
"temperature": 1.0
}
}
},
Expand All @@ -64,15 +62,13 @@
"top_p": 0.8
},
"openai/o1": {
"temperature": 0.7,
"top_p": 0.8
"temperature": 1.0
},
"openai/o3": {
"temperature": 1.0
},
"openai/o4-mini": {
"temperature": 0.7,
"top_p": 0.8
"temperature": 1.0
},
"anthropic/claude-3.7-sonnet": {
"temperature": 0.7,
Expand Down
Loading