Anveshak

A platform designed to help job seekers generate personalized cold emails to potential employers based on their resume and skills.

Features

Resume parsing and analysis
AI-powered company matching based on skills and experience
Personalized cold email generation with company research
Email sending and tracking
Email history and metrics visualization
Multiple domain support

Anveshak - Comprehensive Documentation

Introduction

Anveshak is a comprehensive platform designed to help job seekers generate personalized cold emails to potential employers and academic faculty. The system combines AI-powered resume parsing, company/faculty research, and personalized email generation to create highly tailored outreach communications.

The platform serves two primary use cases:

Job application emails - Matching candidates with companies based on skills and generating personalized job inquiry emails
Academic collaboration emails - Connecting students with faculty members for research opportunities based on matching interests

Architecture Overview

Anveshak is built using a modern MERN stack architecture:

Frontend: React-based SPA with context-based state management
Backend: Node.js + Express RESTful API server
Database: MongoDB document database
AI Integration: Google Generative AI (Gemini) for natural language processing tasks
Web Scraping: Cheerio and Axios for company/faculty data enrichment

The application follows a microservice-inspired architecture with clear separation of concerns:

Server
├── routes/           # API routes and controllers
├── services/         # Core business logic and external service integration
├── models/           # Data models and schema definitions
├── middleware/       # Request processing middleware
├── config/           # Configuration settings
└── scripts/          # Database initialization and utility scripts

Core Technologies

Node.js: Server-side JavaScript runtime
Express: Web application framework
MongoDB: NoSQL database
Mongoose: MongoDB object modeling
React: Frontend UI library
Vite: Frontend build tool
Google Generative AI (Gemini): LLM for natural language processing
PDF.js: PDF text extraction
Cheerio: HTML parsing for web scraping
Axios: HTTP client for API requests
Cloudinary: Media storage
Nodemailer: Email sending service

Natural Language Processing and AI Algorithms

Resume Parsing Algorithms

Anveshak implements three distinct resume parsing algorithms, each with its own strengths and use cases:

1. Simplified Parser (simplifiedResumeParser.js)

A lightweight, rule-based parser optimized for extracting core information:

Skills
Experience
Projects

Algorithm Overview:

Extract text from PDF using pdf-parse
Apply regex pattern matching to identify section headers
Extract content between known section headers
Clean and normalize extracted text
Apply heuristics to identify skills, experiences, and projects

Key Features:

Low computational overhead
No external API dependencies
Fast processing time
Domain-specific pattern matching for technical resumes

Code Implementation:

async function parseResumeText(pdfBuffer) {
  // Extract text from PDF
  const rawText = await extractTextFromPdf(pdfBuffer);
  
  // Identify sections using headers
  const sections = identifySections(rawText);
  
  // Extract data from each section
  const skills = extractSkills(sections.skills);
  const experience = extractExperience(sections.experience);
  const projects = extractProjects(sections.projects);
  
  return {
    skills,
    experience,
    projects
  };
}

Algorithm Overview:

Extract text items from PDF with position data using pdf.js
Group text items into lines based on Y-coordinates
Group lines into sections using font sizes and formatting patterns
Apply specialized parsing rules to each identified section

Key Features:

Preserves formatting and structure
Handles complex resume layouts
Extracts rich metadata including dates, positions, achievements
Better at identifying hierarchical information

Advantages:

More accurate section detection
Better at handling multi-column layouts
Preserves chronological ordering
Extracts detailed metadata

3. AI-powered Parser (aiService.js)

Uses Google's Gemini models to extract and structure resume data through natural language understanding:

Algorithm Overview:

Send PDF text to Gemini API with structured extraction prompts
Process model responses to extract specific resume components
Structure and validate the extracted information

Key Features:

Superior understanding of context and semantics
Handles non-standard formats and language variations
Categorizes skills by type automatically
Identifies achievements and quantitative results

Benefits:

Most accurate for diverse resume formats
Best at understanding implied skills and qualifications
Provides richer semantic categorization
Handles international and varied resume styles

Sample Prompt Structure:

const prompt = `
  Extract all technical skills from the following resume text.
  - Look for sections labeled "Skills", "Technical Skills", "Technologies", etc.
  - For sections with bullet points or comma-separated lists, extract those directly
  - For skills written in sentences, extract individual technical terms
  - Return ONLY an array of strings with no other text or explanation
  ${text}
`;

Email Generation Algorithms

Anveshak utilizes generative AI to produce personalized email content with several specialized algorithms:

1. Company Research Email Generation

Algorithm Overview:

Process company information and candidate resume data
Create a structured context object with key matching points
Generate a prompt that emphasizes personalization and authentic connection
Apply generative model with specific parameters for formal correspondence
Parse and validate the generated response for structure and personalization

Key Features:

Company-specific research integration
Role-appropriate technical language
Quantified achievements matching
Multiple fallback methods for resilience

Implementation Highlights:

export const generateEmailContent = async ({
  userName,
  userEmail,
  company,
  role,
  skills,
  experience,
  projects,
  companyResearch
}) => {
  // Structure context for generation
  const context = {
    candidate: { name: userName, email: userEmail, skills, experience, projects },
    company: { name: company, role, research: companyResearch }
  };
  
  // Generate focused prompt
  const prompt = buildPersonalizedPrompt(context);
  
  // Generate content with specific parameters
  const result = await model.generateContent({
    contents: [{ role: "user", parts: [{ text: prompt }] }],
    generationConfig: {
      temperature: 0.7,
      topP: 0.8,
      topK: 40,
      maxOutputTokens: 1024,
    },
    safetySettings: [
      {
        category: "HARM_CATEGORY_DANGEROUS_CONTENT",
        threshold: "BLOCK_MEDIUM_AND_ABOVE",
      },
    ],
  });
  
  // Parse and structure the response
  return parseEmailResponse(result);
}

2. Academic Research Email Generation

Specialized algorithm for generating personalized academic research collaboration emails:

Algorithm Overview:

Extract faculty research interests, publications, and academic background
Match candidate skills and experience to faculty research areas
Generate academically appropriate, research-focused email content
Apply multiple validation checks for scholarly tone and specificity
Format according to academic correspondence conventions

Key Features:

Research-specific terminology
Publication reference integration
Academic institutional knowledge
Formal scholarly communication style

Company Research and Matching

Anveshak implements a robust company matching system to connect candidates with relevant employers:

Algorithm Overview:

Extract key skills and experience from candidate resume
Execute multi-source company search:
- Database search for exact and fuzzy skill matches
- LLM-powered company suggestions based on skills and role
- Web scraping for company technology stack verification
Score and rank companies based on matching criteria
Enrich company profiles with additional research

Key Features:

Multiple data sources for company matching
Weighted skill relevance scoring
Technology stack compatibility analysis
Industry-specific targeting

Academic Faculty Matching

Specialized algorithm for matching candidates with academic faculty:

Algorithm Overview:

Extract academic interests and research experience from resume
Search faculty database using domain-specific matching criteria
Enrich results with web scraping from university websites
Score and rank faculty based on research interest overlap
Generate potential collaboration opportunities

Key Features:

Research interest semantic matching
Publication relevance analysis
Institution type filtering
Department and specialization targeting

System Features

Resume Analysis

Anveshak provides comprehensive resume analysis features:

Automated Skill Extraction: Identifies technical skills, technologies, and methodologies
Experience Analysis: Extracts and categorizes work history, roles, and achievements
Project Portfolio Extraction: Identifies personal and professional projects with technologies used
Education Background Analysis: Extracts educational qualifications and relevant coursework
Achievement Quantification: Identifies and highlights quantified achievements
Technology Categorization: Classifies skills by type (languages, frameworks, tools)

Usage:

Upload PDF resume through the web interface or API
System automatically processes and analyzes the document
Review extracted information with confidence scores
Edit or enhance extracted data if needed
Proceed to company matching or email generation

Company Research and Matching

Anveshak identifies relevant companies based on candidate skills:

Skill-Based Matching: Finds companies using technologies in candidate's skill set
Role-Based Targeting: Focuses on companies with positions matching desired roles
Technology Stack Analysis: Identifies companies using specific technologies
Company Research: Gathers information about company products, culture, and achievements
Email Contact Discovery: Finds appropriate contact emails for companies
Relevance Scoring: Ranks companies by match quality for targeting

Usage:

System analyzes candidate's skills and desired roles
Matches are presented with relevance scores and research information
User can select target companies for email generation
System maintains company data for future matches

Email Generation

Anveshak generates personalized cold emails for job applications:

Company-Specific Personalization: References company products, technologies, and culture
Skill Matching: Highlights candidate skills relevant to the company
Achievement Emphasis: Incorporates quantified achievements and relevant experience
Dynamic Templates: Generates unique emails without repetitive patterns
Multiple Tone Options: Professional, enthusiastic, or formal communication styles
Customizable Content: Generated content can be edited before sending
Merge Fields: Automatic insertion of personalized information

Usage:

Select target companies from match results
Generate personalized email drafts for each company
Review and edit generated content if desired
Send emails directly or copy to clipboard for external sending
Track email status and responses

Academic Email Generation

Anveshak generates personalized emails for academic research collaboration:

Research Interest Alignment: Matches candidate interests with faculty research areas
Publication References: Cites relevant faculty publications and research projects
Academic Tone: Maintains appropriate scholarly communication style
Institution-Appropriate Content: Adapts to university type and department culture
Research Collaboration Proposals: Suggests specific collaboration opportunities
Academic Background Integration: Highlights relevant coursework and research experience

Usage:

Search for faculty members by research interest or institution
Select target faculty members for outreach
Generate personalized academic emails highlighting research alignment
Send or export emails for external sending
Track academic outreach campaigns

Email Management

Anveshak provides comprehensive email campaign management:

Email History: Tracks all generated and sent emails
Status Tracking: Monitors email status (draft, sent, replied)
Email Analytics: Provides open rates, response rates, and effectiveness metrics
Campaign Grouping: Organizes emails by campaign, company type, or time period
Follow-Up Suggestions: Recommends appropriate follow-up timing and content
Template Management: Saves successful emails as templates for future use
Response Handling: Assists with response management and follow-up

Usage:

Access email history from dashboard
Filter and sort by various criteria (date, status, company)
View detailed metrics and performance analytics
Set up follow-up reminders and templates
Archive or categorize email threads

API Reference

Public API Endpoints

Authentication API

POST /api/auth/register

Register a new user
Parameters:
- name (string): User's full name
- email (string): User's email address
- password (string): User's password
Response: User object with authentication token

POST /api/auth/login

Log in an existing user
Parameters:
- email (string): User's email address
- password (string): User's password
Response: User object with authentication token

POST /api/auth/verify

Verify user email
Parameters:
- token (string): Email verification token
Response: Verification success status

POST /api/auth/reset-password-request

Request a password reset
Parameters:
- email (string): User's email address
Response: Request status

POST /api/auth/reset-password

Reset user password
Parameters:
- token (string): Password reset token
- password (string): New password
Response: Reset status

Resume API

POST /api/resumes/upload

Upload and parse a resume
Authentication: Required
Parameters:
- file (file): PDF resume file
Response: Parsed resume data

GET /api/resumes/:id

Get resume data by ID
Authentication: Required
Response: Complete resume data

PUT /api/resumes/:id

Update resume data
Authentication: Required
Parameters:
- Resume data fields to update
Response: Updated resume data

Email Generation API

POST /api/emails/generate

Generate personalized emails
Authentication: Required
Parameters:
- resumeId (string): Resume ID
- action (string): Action type (find-companies, generate-emails)
- companies (array, optional): Selected companies
Response: Generated email content or company matches

POST /api/academic/search-and-email

Search for academic faculty and generate emails
Authentication: Required
Parameters:
- domains (array): Research interests
Response: Faculty list

POST /api/academic/generate-preview-emails

Generate preview emails for selected faculty
Authentication: Required
Parameters:
- resumeId (string): Resume ID
- selectedFaculty (array): Selected faculty members
Response: Generated preview emails

Protected API Endpoints

GET /api/emails/user/:userId

Get all emails for a user
Authentication: Required (Admin or Owner)
Response: Email records for user

POST /api/emails/send

Send emails from drafts
Authentication: Required
Parameters:
- emailIds (array): Email IDs to send
Response: Send status

GET /api/users/me

Get current user profile
Authentication: Required
Response: User profile data

Error Handling and Resilience

JSON Parsing Enhancement

Anveshak implements robust error handling for JSON parsing in AI responses:

Key Features:

Enhanced JSON Validation: Comprehensive validation for JSON structure
Balanced Braces Check: Ensures JSON has matching opening and closing braces
Advanced JSON Repair: Fixes common issues in malformed JSON responses
Multi-layer Fallback Strategy: Multiple fallback methods for parsing failures
Pattern Matching Extraction: Uses regex to extract content when JSON parsing fails

Implementation:

// First attempt standard parsing with repair
try {
  emailContent = JSON.parse(fixedJson);
} catch (innerParseError) {
  // Second attempt: extract JSON object bounds
  const jsonStartIndex = fixedJson.indexOf('{');
  const jsonEndIndex = fixedJson.lastIndexOf('}') + 1;
  
  if (jsonStartIndex !== -1 && jsonEndIndex > jsonStartIndex) {
    const extractedJson = fixedJson.substring(jsonStartIndex, jsonEndIndex);
    emailContent = JSON.parse(extractedJson);
  } else {
    // Third attempt: regex extraction for email components
    const subjectMatch = text.match(/"subject"\s*:\s*"([^"]+)"/);
    const bodyMatch = text.match(/"body"\s*:\s*"([\s\S]+?)(?:"\s*}|\s*"\s*$)/);
    
    if (subjectMatch && bodyMatch) {
      emailContent = {
        subject: subjectMatch[1].trim(),
        body: bodyMatch[1].trim()
      };
    }
  }
}

Safety Settings

The application implements proper safety settings for AI-generated content:

safetySettings: [
  {
    category: "HARM_CATEGORY_DANGEROUS_CONTENT",
    threshold: "BLOCK_MEDIUM_AND_ABOVE",
  },
],

Implementation Guides

Setting Up the Environment

Clone the repository
Install dependencies:
```
npm install
cd server && npm install
```
Create .env files in root and server directories with required values
Initialize the database:
```
npm run init-db
```
Start the development servers:
```
npm run dev
```

Integrating with Google Generative AI

Obtain Gemini API key from Google AI Studio
Add key to environment variables
Configure safety settings and generation parameters
Implement error handling and fallback mechanisms

Adding New Email Templates

Create template definition with variables
Implement prompt structure for generation
Define validation rules for generated content
Add UI components for template selection

Security Practices

Anveshak implements several security best practices:

JWT Authentication: Secure token-based authentication
Password Hashing: Secure password storage with bcrypt
Input Validation: Comprehensive validation of user inputs
Rate Limiting: Protection against brute force and DoS attacks
CORS Configuration: Controlled cross-origin resource sharing
Environment Variables: Secure storage of sensitive information
Email Verification: Required email verification for new accounts
Content Sanitization: Input and output sanitization to prevent XSS

Future Roadmap

Upcoming features and improvements:

Interview Preparation: AI-powered interview question suggestions
Response Templates: Smart response templates for common email replies
Multi-language Support: Support for resumes and emails in multiple languages
Integration APIs: External system integration through APIs
Advanced Analytics: Enhanced email performance analytics
Mobile Application: Native mobile app for on-the-go management
Networking Features: Contact management and relationship tracking

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
docs		docs
server		server
src		src
test/data		test/data
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Resume.pdf		Resume.pdf
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js

License

THE-DEEPDAS/Anveshak

Folders and files

Latest commit

History

Repository files navigation

Anveshak

Features

Anveshak - Comprehensive Documentation

Table of Contents

Introduction

Architecture Overview

Core Technologies

Natural Language Processing and AI Algorithms

Resume Parsing Algorithms

1. Simplified Parser (simplifiedResumeParser.js)

3. AI-powered Parser (aiService.js)

Email Generation Algorithms

1. Company Research Email Generation

2. Academic Research Email Generation

Company Research and Matching

Academic Faculty Matching

System Features

Resume Analysis

Company Research and Matching

Email Generation

Academic Email Generation

Email Management

API Reference

Public API Endpoints

Authentication API

Resume API

Email Generation API

Protected API Endpoints

Error Handling and Resilience

JSON Parsing Enhancement

Safety Settings

Implementation Guides

Setting Up the Environment

Integrating with Google Generative AI

Adding New Email Templates

Security Practices

Future Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages