A modern, responsive single-page application that helps users select the optimal GPU configuration for running large language model inference workloads. The application analyzes user requirements including model specifications, performance targets, and optimization preferences to recommend the top 3 GPU options with detailed cost, latency, and throughput metrics.
- Intelligent GPU Recommendations: Get top 3 GPU recommendations based on your specific requirements using scientific memory calculations and weighted scoring algorithms
- Scientific Memory Calculations: Uses precise formulas to calculate inference memory requirements and recommend optimal serving methodologies
- Hugging Face Integration: Autocomplete model search and automatic model specification retrieval from Hugging Face Model Hub
- AWS Service Recommendations: Suggests relevant cloud services (EC2, SageMaker, Bedrock, Inferentia) for your workload
- Dark Mode Support: Built-in theme switching with localStorage persistence
- Responsive Design: Mobile-first design that works on all devices with touch-friendly interactions
- Accessibility: Full keyboard navigation, screen reader support, and proper ARIA labels
- Form Persistence: Automatically saves and restores form inputs using localStorage
- Comprehensive Error Handling: Graceful error handling with user-friendly messages and fallback options
- Frontend: React 19 with TypeScript and functional components
- Framework: Next.js 15.4.6 with App Router
- Styling: Tailwind CSS v4 with dark mode support
- Testing: Jest with React Testing Library
- API Integration: Hugging Face Model Hub API
- State Management: React hooks (useState, useEffect, useContext)
- Build Tool: Next.js with Turbopack for fast development
- Node.js 18+ (recommended: Node.js 20+)
- npm, yarn, or pnpm package manager
-
Clone the repository:
git clone <repository-url> cd llm-gpu-recommender
-
Install dependencies:
npm install # or yarn install # or pnpm install
-
Run the development server:
npm run dev # or yarn dev # or pnpm dev
-
Open http://localhost:3000 in your browser
npm run dev- Start development server with Turbopack for fast hot reloadingnpm run build- Build the application for productionnpm run start- Start the production server (requiresnpm run buildfirst)npm run lint- Run ESLint to check code quality and stylenpm run test- Run all tests oncenpm run test:watch- Run tests in watch mode for development
# Start development with hot reloading
npm run dev
# Run tests during development
npm run test:watch
# Check code quality
npm run lint
# Build and test production build locally
npm run build && npm run startMain recommendation endpoint that analyzes user requirements and returns GPU recommendations.
Request Body:
{
"modelId": "meta-llama/Llama-2-7b-hf",
"paramCount": 7000000000,
"seqLen": 2048,
"batchSize": 1,
"latencyMs": 100,
"throughputTps": 50,
"techniques": ["quantization", "vllm"]
}Response:
{
"recommendations": [
{
"id": "nvidia-a100-80gb",
"name": "NVIDIA A100 80GB",
"vendor": "NVIDIA",
"estimatedCost": 3.06,
"latency": 45,
"throughput": 120,
"memory": 80,
"memoryBandwidth": 2039,
"fp16Tflops": 312,
"int8Tflops": 624,
"rationale": "Best performance for large models, excellent memory headroom",
"compositeScore": 0.85,
"awsServices": [
{
"service": "EC2 (p4d.24xlarge)",
"reasoning": "Direct GPU access with NVIDIA A100, ideal for multi-GPU setups",
"costEffectiveness": 0.8
}
]
}
],
"memoryCalculation": {
"modelMemory": 14000,
"activationMemory": 512,
"totalMemory": 15912,
"recommendedMethod": "standard-single-gpu",
"bufferMemory": 1400
},
"metadata": {
"totalGPUsEvaluated": 15,
"inferenceMethod": "standard-single-gpu",
"timestamp": "2024-01-15T10:30:00.000Z"
}
}Search for Hugging Face models with autocomplete functionality.
Parameters:
q(required): Search query (minimum 2 characters)
Response:
{
"models": [
{
"id": "meta-llama/Llama-2-7b-hf",
"name": "Llama 2 7B",
"downloads": 1000000,
"likes": 5000,
"tags": ["llama", "7b"],
"pipeline_tag": "text-generation"
}
],
"cached": false,
"timestamp": "2024-01-15T10:30:00.000Z"
}Get detailed metadata for a specific Hugging Face model.
Response:
{
"metadata": {
"id": "meta-llama/Llama-2-7b-hf",
"parameterCount": 7000000000,
"hiddenSize": 4096,
"vocabularySize": 32000,
"maxSequenceLength": 4096,
"architecture": "LlamaForCausalLM",
"quantizationSupport": true
},
"cached": false,
"timestamp": "2024-01-15T10:30:00.000Z"
}src/
├── app/ # Next.js App Router
│ ├── api/ # API routes
│ │ ├── models/ # Model-related endpoints
│ │ │ ├── search/ # Model search endpoint
│ │ │ └── [modelId]/ # Model metadata endpoint
│ │ └── recommend-inference/ # Main recommendation endpoint
│ ├── globals.css # Global styles and Tailwind imports
│ ├── layout.tsx # Root layout with theme provider
│ ├── page.tsx # Home page component
│ └── favicon.ico # Application favicon
├── components/ # React components
│ ├── forms/ # Form-related components
│ │ ├── InputForm.tsx # Main input form
│ │ ├── ModelSelector.tsx # Model selection with autocomplete
│ │ ├── NumericInput.tsx # Reusable numeric input
│ │ ├── OptimizationCheckboxes.tsx # Technique selection
│ │ └── PerformanceInputs.tsx # Latency/throughput inputs
│ ├── layout/ # Layout components
│ │ ├── HeroSection.tsx # Title and subtitle
│ │ └── ResultsSection.tsx # GPU recommendations display
│ └── ui/ # UI components
│ ├── ErrorBoundary.tsx # Error boundary wrapper
│ ├── GPUCard.tsx # Individual GPU recommendation card
│ ├── LoadingSpinner.tsx # Loading state component
│ ├── NetworkStatus.tsx # Network connectivity indicator
│ └── ThemeToggle.tsx # Dark/light mode toggle
├── contexts/ # React contexts
│ └── ThemeContext.tsx # Theme management context
├── data/ # Static data
│ └── gpuDatabase.ts # GPU specifications database
├── hooks/ # Custom React hooks
│ └── useFormPersistence.ts # Form state persistence hook
├── lib/ # Library utilities
│ └── utils.ts # Utility functions (clsx, etc.)
├── types/ # TypeScript type definitions
│ └── index.ts # All application types
├── utils/ # Utility functions
│ ├── errorHandling.ts # Error handling utilities
│ ├── gpuScoring.ts # GPU scoring algorithms
│ ├── localStorage.ts # localStorage utilities
│ └── memoryCalculator.ts # Memory calculation functions
└── test-setup.ts # Jest test configuration
__tests__/ # Test files (mirrors src structure)
├── api/ # API endpoint tests
├── components/ # Component tests
├── data/ # Data layer tests
├── hooks/ # Custom hook tests
├── integration/ # End-to-end integration tests
├── utils/ # Utility function tests
└── setup.test.ts # Test environment setup
The application uses scientific formulas to calculate inference memory requirements:
-
Model Weights Memory:
M_model = P × b- P: Parameter count
- b: Bytes per parameter (2 for FP16, 1 for INT8)
-
Activation Memory:
M_act = α × B × L × H × b- α: Activation multiplier (≈1 for inference)
- B: Batch size
- L: Sequence length
- H: Hidden size
-
Total Memory:
M_total = M_model + M_act + M_buffer- Buffer: 10% overhead for system operations
Uses a weighted composite scoring algorithm:
- Cost (40%): Lower cost per hour = higher score
- Latency (30%): Meeting latency requirements = higher score
- Throughput (20%): Meeting throughput targets = higher score
- Memory Fit (5%): Better memory utilization = higher score
- Technique Support (5%): Supporting user's optimization techniques = higher score
- Select a Model: Choose from common sizes (7B, 13B, 70B) or search for specific Hugging Face models
- Set Requirements: Configure sequence length, batch size, latency, and throughput requirements
- Choose Optimizations: Select techniques like quantization, vLLM, or tensor parallelism
- Get Recommendations: Click "Get Top 3 GPUs" to receive personalized recommendations
- Custom Models: Enter any Hugging Face model ID for automatic parameter detection
- Performance Tuning: Adjust latency and throughput requirements based on your use case
- Cost Optimization: Compare recommendations to find the most cost-effective option
- AWS Integration: Review AWS service recommendations for cloud deployment
This project follows a spec-driven development approach. See the .kiro/specs/llm-gpu-recommender/ directory for:
requirements.md- Detailed feature requirements in EARS formatdesign.md- Technical design document with architecture and algorithmstasks.md- Implementation task list with progress tracking
- Unit Tests: Test individual functions and components
- Integration Tests: Test API endpoints and data flow
- Component Tests: Test React component behavior and interactions
- End-to-End Tests: Test complete user workflows
Run tests with:
# Run all tests
npm run test
# Run tests in watch mode during development
npm run test:watch
# Run specific test files
npm run test -- --testPathPattern=memoryCalculatorThe project uses ESLint for code quality and consistency:
# Check code quality
npm run lint
# Auto-fix issues where possible
npm run lint -- --fix1. Development server won't start
- Ensure Node.js 18+ is installed:
node --version - Clear node_modules and reinstall:
rm -rf node_modules package-lock.json && npm install - Check if port 3000 is available or use a different port:
npm run dev -- -p 3001
2. Tests failing
- Ensure all dependencies are installed:
npm install - Clear Jest cache:
npm run test -- --clearCache - Check test setup file:
src/test-setup.ts
3. Build errors
- Check TypeScript errors:
npx tsc --noEmit - Ensure all imports are correct and files exist
- Clear Next.js cache:
rm -rf .next
4. API endpoints not working
- Check network connectivity for Hugging Face API calls
- Verify API route files are in correct locations under
src/app/api/ - Check browser developer tools for detailed error messages
5. Styling issues
- Ensure Tailwind CSS is properly configured in
tailwind.config.ts - Check if PostCSS is configured correctly in
postcss.config.mjs - Verify global styles are imported in
src/app/globals.css
Slow autocomplete search:
- Results are cached for 5 minutes to improve performance
- Rate limiting prevents excessive API calls
- Fallback models are provided when Hugging Face API is unavailable
Memory calculation taking too long:
- Calculations are optimized for common model sizes
- Results are memoized to avoid recalculation
- Consider reducing batch size or sequence length for very large models
- Supported Browsers: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+
- Mobile Support: iOS Safari 14+, Chrome Mobile 90+
- Accessibility: Tested with screen readers and keyboard navigation
No environment variables are required for basic functionality. The application works entirely with client-side code and public APIs.
For production deployment, consider setting:
NODE_ENV=productionfor optimized builds- Custom API endpoints if using proxied Hugging Face access
- Check the Issues: Look for similar problems in the project issues
- Review Logs: Check browser developer tools console for detailed error messages
- Test API Endpoints: Use tools like curl or Postman to test API endpoints directly
- Verify Dependencies: Ensure all package versions match
package.json
- Use
npm run test:watchduring development for immediate feedback - Enable React Developer Tools browser extension for component debugging
- Use the Network tab in browser dev tools to monitor API calls
- Check the Application tab in dev tools to verify localStorage persistence
- Code Style: Follow the existing TypeScript and React conventions
- Testing: Write tests for new functionality using Jest and React Testing Library
- Documentation: Update README and inline comments for new features
- Type Safety: Ensure all TypeScript types are properly defined
- Accessibility: Test with keyboard navigation and screen readers
- Performance: Consider performance implications of new features
- Fork the repository and create a feature branch
- Make your changes with appropriate tests
- Ensure all tests pass:
npm run test - Check code quality:
npm run lint - Build successfully:
npm run build - Update documentation as needed
- Submit a pull request with a clear description
This project is licensed under the MIT License. See the LICENSE file for details.