A platform designed to help job seekers generate personalized cold emails to potential employers based on their resume and skills.
- Resume parsing and analysis
- AI-powered company matching based on skills and experience
- Personalized cold email generation with company research
- Email sending and tracking
- Email history and metrics visualization
- Multiple domain support
- Introduction
- Architecture Overview
- Core Technologies
- Natural Language Processing and AI Algorithms
- System Features
- API Reference
- Error Handling and Resilience
- Implementation Guides
- Security Practices
- Future Roadmap
Anveshak is a comprehensive platform designed to help job seekers generate personalized cold emails to potential employers and academic faculty. The system combines AI-powered resume parsing, company/faculty research, and personalized email generation to create highly tailored outreach communications.
The platform serves two primary use cases:
- Job application emails - Matching candidates with companies based on skills and generating personalized job inquiry emails
- Academic collaboration emails - Connecting students with faculty members for research opportunities based on matching interests
Anveshak is built using a modern MERN stack architecture:
- Frontend: React-based SPA with context-based state management
- Backend: Node.js + Express RESTful API server
- Database: MongoDB document database
- AI Integration: Google Generative AI (Gemini) for natural language processing tasks
- Web Scraping: Cheerio and Axios for company/faculty data enrichment
The application follows a microservice-inspired architecture with clear separation of concerns:
Server
├── routes/ # API routes and controllers
├── services/ # Core business logic and external service integration
├── models/ # Data models and schema definitions
├── middleware/ # Request processing middleware
├── config/ # Configuration settings
└── scripts/ # Database initialization and utility scripts
- Node.js: Server-side JavaScript runtime
- Express: Web application framework
- MongoDB: NoSQL database
- Mongoose: MongoDB object modeling
- React: Frontend UI library
- Vite: Frontend build tool
- Google Generative AI (Gemini): LLM for natural language processing
- PDF.js: PDF text extraction
- Cheerio: HTML parsing for web scraping
- Axios: HTTP client for API requests
- Cloudinary: Media storage
- Nodemailer: Email sending service
Anveshak implements three distinct resume parsing algorithms, each with its own strengths and use cases:
A lightweight, rule-based parser optimized for extracting core information:
- Skills
- Experience
- Projects
Algorithm Overview:
- Extract text from PDF using pdf-parse
- Apply regex pattern matching to identify section headers
- Extract content between known section headers
- Clean and normalize extracted text
- Apply heuristics to identify skills, experiences, and projects
Key Features:
- Low computational overhead
- No external API dependencies
- Fast processing time
- Domain-specific pattern matching for technical resumes
Code Implementation:
async function parseResumeText(pdfBuffer) {
// Extract text from PDF
const rawText = await extractTextFromPdf(pdfBuffer);
// Identify sections using headers
const sections = identifySections(rawText);
// Extract data from each section
const skills = extractSkills(sections.skills);
const experience = extractExperience(sections.experience);
const projects = extractProjects(sections.projects);
return {
skills,
experience,
projects
};
}
Algorithm Overview:
- Extract text items from PDF with position data using pdf.js
- Group text items into lines based on Y-coordinates
- Group lines into sections using font sizes and formatting patterns
- Apply specialized parsing rules to each identified section
Key Features:
- Preserves formatting and structure
- Handles complex resume layouts
- Extracts rich metadata including dates, positions, achievements
- Better at identifying hierarchical information
Advantages:
- More accurate section detection
- Better at handling multi-column layouts
- Preserves chronological ordering
- Extracts detailed metadata
Uses Google's Gemini models to extract and structure resume data through natural language understanding:
Algorithm Overview:
- Send PDF text to Gemini API with structured extraction prompts
- Process model responses to extract specific resume components
- Structure and validate the extracted information
Key Features:
- Superior understanding of context and semantics
- Handles non-standard formats and language variations
- Categorizes skills by type automatically
- Identifies achievements and quantitative results
Benefits:
- Most accurate for diverse resume formats
- Best at understanding implied skills and qualifications
- Provides richer semantic categorization
- Handles international and varied resume styles
Sample Prompt Structure:
const prompt = `
Extract all technical skills from the following resume text.
- Look for sections labeled "Skills", "Technical Skills", "Technologies", etc.
- For sections with bullet points or comma-separated lists, extract those directly
- For skills written in sentences, extract individual technical terms
- Return ONLY an array of strings with no other text or explanation
${text}
`;
Anveshak utilizes generative AI to produce personalized email content with several specialized algorithms:
Algorithm Overview:
- Process company information and candidate resume data
- Create a structured context object with key matching points
- Generate a prompt that emphasizes personalization and authentic connection
- Apply generative model with specific parameters for formal correspondence
- Parse and validate the generated response for structure and personalization
Key Features:
- Company-specific research integration
- Role-appropriate technical language
- Quantified achievements matching
- Multiple fallback methods for resilience
Implementation Highlights:
export const generateEmailContent = async ({
userName,
userEmail,
company,
role,
skills,
experience,
projects,
companyResearch
}) => {
// Structure context for generation
const context = {
candidate: { name: userName, email: userEmail, skills, experience, projects },
company: { name: company, role, research: companyResearch }
};
// Generate focused prompt
const prompt = buildPersonalizedPrompt(context);
// Generate content with specific parameters
const result = await model.generateContent({
contents: [{ role: "user", parts: [{ text: prompt }] }],
generationConfig: {
temperature: 0.7,
topP: 0.8,
topK: 40,
maxOutputTokens: 1024,
},
safetySettings: [
{
category: "HARM_CATEGORY_DANGEROUS_CONTENT",
threshold: "BLOCK_MEDIUM_AND_ABOVE",
},
],
});
// Parse and structure the response
return parseEmailResponse(result);
}
Specialized algorithm for generating personalized academic research collaboration emails:
Algorithm Overview:
- Extract faculty research interests, publications, and academic background
- Match candidate skills and experience to faculty research areas
- Generate academically appropriate, research-focused email content
- Apply multiple validation checks for scholarly tone and specificity
- Format according to academic correspondence conventions
Key Features:
- Research-specific terminology
- Publication reference integration
- Academic institutional knowledge
- Formal scholarly communication style
Anveshak implements a robust company matching system to connect candidates with relevant employers:
Algorithm Overview:
- Extract key skills and experience from candidate resume
- Execute multi-source company search:
- Database search for exact and fuzzy skill matches
- LLM-powered company suggestions based on skills and role
- Web scraping for company technology stack verification
- Score and rank companies based on matching criteria
- Enrich company profiles with additional research
Key Features:
- Multiple data sources for company matching
- Weighted skill relevance scoring
- Technology stack compatibility analysis
- Industry-specific targeting
Specialized algorithm for matching candidates with academic faculty:
Algorithm Overview:
- Extract academic interests and research experience from resume
- Search faculty database using domain-specific matching criteria
- Enrich results with web scraping from university websites
- Score and rank faculty based on research interest overlap
- Generate potential collaboration opportunities
Key Features:
- Research interest semantic matching
- Publication relevance analysis
- Institution type filtering
- Department and specialization targeting
Anveshak provides comprehensive resume analysis features:
- Automated Skill Extraction: Identifies technical skills, technologies, and methodologies
- Experience Analysis: Extracts and categorizes work history, roles, and achievements
- Project Portfolio Extraction: Identifies personal and professional projects with technologies used
- Education Background Analysis: Extracts educational qualifications and relevant coursework
- Achievement Quantification: Identifies and highlights quantified achievements
- Technology Categorization: Classifies skills by type (languages, frameworks, tools)
Usage:
- Upload PDF resume through the web interface or API
- System automatically processes and analyzes the document
- Review extracted information with confidence scores
- Edit or enhance extracted data if needed
- Proceed to company matching or email generation
Anveshak identifies relevant companies based on candidate skills:
- Skill-Based Matching: Finds companies using technologies in candidate's skill set
- Role-Based Targeting: Focuses on companies with positions matching desired roles
- Technology Stack Analysis: Identifies companies using specific technologies
- Company Research: Gathers information about company products, culture, and achievements
- Email Contact Discovery: Finds appropriate contact emails for companies
- Relevance Scoring: Ranks companies by match quality for targeting
Usage:
- System analyzes candidate's skills and desired roles
- Matches are presented with relevance scores and research information
- User can select target companies for email generation
- System maintains company data for future matches
Anveshak generates personalized cold emails for job applications:
- Company-Specific Personalization: References company products, technologies, and culture
- Skill Matching: Highlights candidate skills relevant to the company
- Achievement Emphasis: Incorporates quantified achievements and relevant experience
- Dynamic Templates: Generates unique emails without repetitive patterns
- Multiple Tone Options: Professional, enthusiastic, or formal communication styles
- Customizable Content: Generated content can be edited before sending
- Merge Fields: Automatic insertion of personalized information
Usage:
- Select target companies from match results
- Generate personalized email drafts for each company
- Review and edit generated content if desired
- Send emails directly or copy to clipboard for external sending
- Track email status and responses
Anveshak generates personalized emails for academic research collaboration:
- Research Interest Alignment: Matches candidate interests with faculty research areas
- Publication References: Cites relevant faculty publications and research projects
- Academic Tone: Maintains appropriate scholarly communication style
- Institution-Appropriate Content: Adapts to university type and department culture
- Research Collaboration Proposals: Suggests specific collaboration opportunities
- Academic Background Integration: Highlights relevant coursework and research experience
Usage:
- Search for faculty members by research interest or institution
- Select target faculty members for outreach
- Generate personalized academic emails highlighting research alignment
- Send or export emails for external sending
- Track academic outreach campaigns
Anveshak provides comprehensive email campaign management:
- Email History: Tracks all generated and sent emails
- Status Tracking: Monitors email status (draft, sent, replied)
- Email Analytics: Provides open rates, response rates, and effectiveness metrics
- Campaign Grouping: Organizes emails by campaign, company type, or time period
- Follow-Up Suggestions: Recommends appropriate follow-up timing and content
- Template Management: Saves successful emails as templates for future use
- Response Handling: Assists with response management and follow-up
Usage:
- Access email history from dashboard
- Filter and sort by various criteria (date, status, company)
- View detailed metrics and performance analytics
- Set up follow-up reminders and templates
- Archive or categorize email threads
POST /api/auth/register
- Register a new user
- Parameters:
name
(string): User's full nameemail
(string): User's email addresspassword
(string): User's password
- Response: User object with authentication token
POST /api/auth/login
- Log in an existing user
- Parameters:
email
(string): User's email addresspassword
(string): User's password
- Response: User object with authentication token
POST /api/auth/verify
- Verify user email
- Parameters:
token
(string): Email verification token
- Response: Verification success status
POST /api/auth/reset-password-request
- Request a password reset
- Parameters:
email
(string): User's email address
- Response: Request status
POST /api/auth/reset-password
- Reset user password
- Parameters:
token
(string): Password reset tokenpassword
(string): New password
- Response: Reset status
POST /api/resumes/upload
- Upload and parse a resume
- Authentication: Required
- Parameters:
file
(file): PDF resume file
- Response: Parsed resume data
GET /api/resumes/:id
- Get resume data by ID
- Authentication: Required
- Response: Complete resume data
PUT /api/resumes/:id
- Update resume data
- Authentication: Required
- Parameters:
- Resume data fields to update
- Response: Updated resume data
POST /api/emails/generate
- Generate personalized emails
- Authentication: Required
- Parameters:
resumeId
(string): Resume IDaction
(string): Action type (find-companies, generate-emails)companies
(array, optional): Selected companies
- Response: Generated email content or company matches
POST /api/academic/search-and-email
- Search for academic faculty and generate emails
- Authentication: Required
- Parameters:
domains
(array): Research interests
- Response: Faculty list
POST /api/academic/generate-preview-emails
- Generate preview emails for selected faculty
- Authentication: Required
- Parameters:
resumeId
(string): Resume IDselectedFaculty
(array): Selected faculty members
- Response: Generated preview emails
GET /api/emails/user/:userId
- Get all emails for a user
- Authentication: Required (Admin or Owner)
- Response: Email records for user
POST /api/emails/send
- Send emails from drafts
- Authentication: Required
- Parameters:
emailIds
(array): Email IDs to send
- Response: Send status
GET /api/users/me
- Get current user profile
- Authentication: Required
- Response: User profile data
Anveshak implements robust error handling for JSON parsing in AI responses:
Key Features:
- Enhanced JSON Validation: Comprehensive validation for JSON structure
- Balanced Braces Check: Ensures JSON has matching opening and closing braces
- Advanced JSON Repair: Fixes common issues in malformed JSON responses
- Multi-layer Fallback Strategy: Multiple fallback methods for parsing failures
- Pattern Matching Extraction: Uses regex to extract content when JSON parsing fails
Implementation:
// First attempt standard parsing with repair
try {
emailContent = JSON.parse(fixedJson);
} catch (innerParseError) {
// Second attempt: extract JSON object bounds
const jsonStartIndex = fixedJson.indexOf('{');
const jsonEndIndex = fixedJson.lastIndexOf('}') + 1;
if (jsonStartIndex !== -1 && jsonEndIndex > jsonStartIndex) {
const extractedJson = fixedJson.substring(jsonStartIndex, jsonEndIndex);
emailContent = JSON.parse(extractedJson);
} else {
// Third attempt: regex extraction for email components
const subjectMatch = text.match(/"subject"\s*:\s*"([^"]+)"/);
const bodyMatch = text.match(/"body"\s*:\s*"([\s\S]+?)(?:"\s*}|\s*"\s*$)/);
if (subjectMatch && bodyMatch) {
emailContent = {
subject: subjectMatch[1].trim(),
body: bodyMatch[1].trim()
};
}
}
}
The application implements proper safety settings for AI-generated content:
safetySettings: [
{
category: "HARM_CATEGORY_DANGEROUS_CONTENT",
threshold: "BLOCK_MEDIUM_AND_ABOVE",
},
],
- Clone the repository
- Install dependencies:
npm install cd server && npm install
- Create
.env
files in root and server directories with required values - Initialize the database:
npm run init-db
- Start the development servers:
npm run dev
- Obtain Gemini API key from Google AI Studio
- Add key to environment variables
- Configure safety settings and generation parameters
- Implement error handling and fallback mechanisms
- Create template definition with variables
- Implement prompt structure for generation
- Define validation rules for generated content
- Add UI components for template selection
Anveshak implements several security best practices:
- JWT Authentication: Secure token-based authentication
- Password Hashing: Secure password storage with bcrypt
- Input Validation: Comprehensive validation of user inputs
- Rate Limiting: Protection against brute force and DoS attacks
- CORS Configuration: Controlled cross-origin resource sharing
- Environment Variables: Secure storage of sensitive information
- Email Verification: Required email verification for new accounts
- Content Sanitization: Input and output sanitization to prevent XSS
Upcoming features and improvements:
- Interview Preparation: AI-powered interview question suggestions
- Response Templates: Smart response templates for common email replies
- Multi-language Support: Support for resumes and emails in multiple languages
- Integration APIs: External system integration through APIs
- Advanced Analytics: Enhanced email performance analytics
- Mobile Application: Native mobile app for on-the-go management
- Networking Features: Contact management and relationship tracking