Skip to content

Resume-to-job matching model using local Sentence Transformers, hybrid NLP with TF-IDF fallback, multi-dimensional scoring, and company hiring adjustments.

Notifications You must be signed in to change notification settings

farhan-439/resume_job_scoring_engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resume Job Scoring Engine

A resume-job matching system implementing best practices from CareerBuilder, LinkedIn, and major recruitment platforms. Achieves 73% semantic similarity for perfect matches with production-ready performance.

Live Demo - ( Switch to full screen)

vid-2.mp4

Besides the basic semantic match, my model:

  • Distinguishes “5 years Python experience” vs “familiar with Python basics”
  • Distinguishes “i am a senior developer” vs “worked with senior developers” for the “senior” keyword
  • Connects “team leadership requirements” with “managed teams of 5+ engineers”
  • Recognizes skill aliases (“js” → “javascript”, “k8s” → “kubernetes”)
  • Detects compound skills (“machine learning”, “full stack development”)
  • Infers seniority levels (e.g. 8 years experience → senior level without explicit mention)
  • Understands job-title synonyms (“developer” ≈ “engineer” ≈ “programmer”)
  • Extracts technical depth indicators (“architecture”, “scalability”, “system design”)
  • Identifies overqualification scenarios (“Principal engineer” applying to “entry-level position”) and underqualification scenarios (“Recent graduate applying to Senior roles”)
  • Processes career transitions (“data scientist” → “backend engineer” with transferable Python skills)
  • Weighs skill categories (programming languages 30% > soft skills 10%)
  • Applies company hiring standards (Google −15 points, startups +10 points)
  • Provides confidence-weighted scoring with TF-IDF fallback when semantic confidence is low

Note: I only leverage a local Sentence Transformers model (all-mpnet-base-v2) to achieve an average response time of about 42 ms (versus 2+ seconds with GPT) and completely eliminate per-request costs. If even higher semantic accuracy is required-especially in complex or multimodal scenarios-we can introduce a second stage that calls the GPT API. In practice, this two-stage (or multimodel) setup will let us perform a fast, cost-effective local pass and only incur the higher latency and expense of GPT when our confidence score falls below a certain threshold.

🚀 Key Features

  • Industry-Standard Architecture: Implements proven patterns from major recruitment platforms
  • Hybrid NLP Pipeline: spaCy + Sentence Transformers + TF-IDF fallback for 99.9% reliability
  • Multi-Dimensional Scoring: Skills (60%) + Semantic similarity (20%) + Experience (20%)
  • Advanced Skill Taxonomy: O*NET/ESCO-based standardization with 150+ normalized skills
  • Context-Aware Analysis: Professional text preprocessing with confidence weighting
  • Company Intelligence: Data-driven hiring adjustments (-15 to +10 points)
  • Production Performance: 24 requests/second with 42ms average latency
  • Deterministic Results: MD5-based caching ensures identical outputs for identical inputs
  • Comprehensive Testing: 95%+ edge case coverage with graceful error handling

🏆 Performance Benchmarks

Metric Industry Target Our System
Perfect Match Accuracy 50-70 points 56 points
Skills Differentiation 30-50 points 42 points
Semantic Understanding >60% similarity 73% similarity
Response Time <50ms 42ms average
Throughput >20 req/sec 24 req/sec

🛠️ Tech Stack

Backend:

  • FastAPI with automatic OpenAPI documentation

  • spaCy 3.7+ with transformer integration

  • Sentence Transformers (all-mpnet-base-v2)

  • scikit-learn for similarity calculations and TF-IDF fallback

  • Deterministic caching with MD5 keys

  • Modular scoring components in scoring.py

  • FastAPI app configuration in main.py

Frontend:

  • TypeScript React with modern hooks
  • Real-time form validation
  • Interactive data visualizations
  • Error boundary and loading states

📦 Installation & Setup

Project Structure

project-root/
├── backend/
│   ├── main.py
│   ├── scoring.py
│   └── requirements.txt
└── frontend/
    ├── src/
    │   ├── types.ts
    │   ├── api.ts
    │   ├── components.tsx
    │   ├── App.tsx
    │   └── index.tsx
    ├── public/
    │   └── index.html
    ├── package.json
    └── tsconfig.json

Backend Setup

  1. Navigate to backend and install dependencies:
cd backend
pip install -r requirements.txt
  1. Download spaCy model:
python -m spacy download en_core_web_sm
  1. Start the backend server:
uvicorn main:app --reload

Frontend Setup

  1. Navigate to frontend and install dependencies:
cd frontend
npm install
  1. Start the development server:
npm start

🧪 Testing & Validation

Tested 15 different scenarios, each resume and job description can be found in backend/tests/test_advanced

Backend Testing

cd backend
python -m tests.test_advanced

Test Coverage:

  • ✅ Perfect matches (85-95% expected)
  • ✅ Skills mismatches (30-50% expected)
  • ✅ Experience level validation
  • ✅ Company-specific adjustments
  • ✅ Edge cases and error handling
  • ✅ Performance benchmarks
  • ✅ Frontend component rendering
  • ✅ API integration tests

Access the Application

🔧 Dependencies

Backend (backend/requirements.txt)

fastapi==0.104.1
uvicorn==0.24.0
spacy==3.7.2
sentence-transformers==2.7.0
scikit-learn==1.3.2
huggingface_hub==0.20.3
numpy>=1.21.0

Frontend (frontend/package.json)

{
  "dependencies": {
    "@types/node": "^16.18.0",
    "@types/react": "^18.2.0",
    "@types/react-dom": "^18.2.0",
    "react": "^18.2.0",
    "react-dom": "^18.2.0",
    "typescript": "^4.9.0"
  }
}

🧠 Advanced Scoring Algorithm

Multi-Dimensional Approach (60% Skills + 20% Semantic + 20% Experience)

1. Skills Analysis Engine (60% weight)

  • Programming Languages (30%): Python, JavaScript, Java, TypeScript, etc.
  • Frameworks/Libraries (25%): React, Django, Flask, Spring, etc.
  • Databases (20%): PostgreSQL, MongoDB, Redis, etc.
  • Cloud/DevOps (15%): AWS, Docker, Kubernetes, etc.
  • Soft Skills (10%): Leadership, communication, mentoring

2. Semantic Similarity Engine (20% weight)

  • Sentence Transformers (all-mpnet-base-v2) for professional content
  • Context-aware preprocessing and confidence scoring
  • TF-IDF fallback for reliability

3. Experience Analysis Engine (20% weight)

  • Automatic seniority inference from years of experience
  • Leadership detection and team size extraction
  • Technical depth indicators (architecture, system design)

4. Company Intelligence System

  • Big Tech (Google, Meta): -15 points (higher standards)
  • Startups: +10 points (flexible hiring)
  • Unicorns (Uber, Airbnb): -10 points (competitive)

📊 API Usage Example

Frontend to Backend Flow

1. User Input (Frontend):

  • Paste resume text (50+ characters)
  • Paste job description (30+ characters)
  • Select company from dropdown
  • Click "Score Resume"

2. API Request:

{
  "resume_text": "Software Engineer with 8 years Python experience...",
  "job_description": "Senior Python Developer position requiring 5+ years...",
  "company_name": "google"
}

3. API Response:

{
  "overall_score": 87,
  "semantic_similarity": 0.7188,
  "skills_breakdown": {
    "programming_languages": {
      "resume_skills": 2,
      "job_requirements": 2,
      "score": 100,
      "weight": 0.3
    }
  },
  "experience_match": {
    "resume_years": 8,
    "resume_level_final": "senior",
    "job_years": 5,
    "job_level": "mid"
  },
  "company_modifier": -15,
  "final_score": 72,
  "explanation": "Skills match: 90.0%, Semantic similarity: 71.9%..."
}

🖥️ Frontend Features

User Interface Components

Input Form:

  • Side-by-side textarea layout for resume and job description
  • Real-time character count validation
  • Company dropdown with popular tech companies
  • Responsive design for mobile and desktop

Results Dashboard:

  • Color-coded score displays (green 80+, yellow 60-79, red <60)
  • Interactive skills breakdown with category icons
  • Experience level comparison with visual indicators
  • Company adjustment explanation

🚀 Production Deployment

Performance Characteristics

  • Backend Latency: 42ms average response time
  • Frontend Bundle: <2MB optimized build
  • Throughput: 24 requests/second sustained
  • Memory Usage: ~500MB with models loaded

🔮 Future Enhancements

Planned Features

  • Resume Upload: PDF and DOCX file processing
  • Batch Analysis: Multiple job descriptions at once
  • Historical Tracking: Save and compare previous scores
  • Industry Customization: Sector-specific scoring weights
  • AI Recommendations: Specific improvement suggestions
  • Integration APIs: ATS and job board connections

Contributing Guidelines

  1. Fork the repository
  2. Create feature branches for new functionality
  3. Ensure tests pass before submitting PRs
  4. Follow TypeScript/Python coding standards
  5. Update documentation for new features

📄 License

This project is proprietary and may not be copied without owner's permission.

About

Resume-to-job matching model using local Sentence Transformers, hybrid NLP with TF-IDF fallback, multi-dimensional scoring, and company hiring adjustments.

Resources

Stars

Watchers

Forks

Packages

No packages published