This project is a web-based tool that matches research papers to a candidate’s resume by utilizing Sentence-BERT (SBERT) and cosine similarity. The tool analyzes the candidate's skills, experience, and projects, then identifies the most relevant research papers based on the alignment between the resume and research paper content. By leveraging advanced natural language processing techniques, it enhances the process of finding research papers that best match a candidate’s qualifications, making the search more efficient and accurate.
Finding relevant research papers based on a resume is a challenging task. This tool automates the process by:
✅ Extracting skills and projects from the resume
✅ Converting both the resume and research papers into vector embeddings
✅ Computing similarity scores using cosine similarity
✅ Returning the most relevant paper based on the highest similarity score
The matching process follows a 4-step pipeline:
- The system extracts key details from the candidate's resume, including:
- ✅ Skills (e.g., Machine Learning, NLP)
- ✅ Projects (e.g., Fake News Detection using BERT)
✅ Example:
Skills = ["Machine Learning", "Natural Language Processing", "Deep Learning", "Python"]
Projects = ["Fake News Detection using BERT", "Text Summarization with LSTM"]
This information is concatenated into a single text input:
"Machine Learning Natural Language Processing Deep Learning Python Fake News Detection using BERT Text Summarization with LSTM"
Component | Tool |
---|---|
Frontend | React.js |
Backend | Flask |
Embedding Model | Sentence-BERT (all-MiniLM-L6-v2) |
Paper Retrieval | Semantic Scholar API |
Similarity Calculation | Cosine Similarity (Scikit-learn) |
Email Generation | Gemini API |
Paper Download | Unpaywall API |
✅ Fast and Efficient: Handles large datasets quickly using SBERT.
✅ Accurate Matching: High similarity scoring using cosine similarity.
✅ Automated Paper Retrieval: Uses Semantic Scholar to find relevant papers.
✅ Secure Data Handling: Ensures data privacy and integrity.
✅ Email Automation: Automatically generates internship request emails based on the matching paper.
- Resume Parsing and Skill Extraction
- Research Paper Retrieval
- Convert to Sentence Embeddings
- Compute Cosine Similarity
- Generate and Send Email
The system extracts skills and projects from the resume using pdfplumber
, spaCy
, and KeyBERT
.
Example Skills:
Machine Learning, Natural Language Processing, Deep Learning, Python, Fake News Detection using BERT, Text Summarization with LSTM
The system retrieves research papers using Web Scraping with the help of beautifulsoup4 & Spacy
Example papers:
📜 Paper 1:
Title: "A Deep Learning Approach to Fake News Detection"
Abstract: "We propose a model based on BERT for detecting fake news articles. Our approach achieves state-of-the-art performance in text classification tasks."
📜 Paper 2:
Title: "Efficient Image Classification with CNNs"
Abstract: "We present an optimized CNN model for image classification. The model reduces computational cost while maintaining accuracy."
The system converts text into high-dimensional vector embeddings using Sentence-BERT (all-MiniLM-L6-v2
):
from sentence_transformers import SentenceTransformer
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
resume_embedding = embed_model.encode(resume_text)
paper_1_embedding = embed_model.encode(paper_1_text)
paper_2_embedding = embed_model.encode(paper_2_text)
Resume Embedding → [0.12, -0.08, ..., 0.32]
Paper 1 Embedding → [0.11, -0.07, ..., 0.30]
Paper 2 Embedding → [0.02, 0.45, ..., -0.12]
Cosine similarity measures how similar two vectors are:
[ \text{Cosine Similarity} = \frac{A \cdot B}{||A|| \cdot ||B||} ]
✅ Example calculation:
from sklearn.metrics.pairwise import cosine_similarity
similarity_1 = cosine_similarity([resume_embedding], [paper_1_embedding])
similarity_2 = cosine_similarity([resume_embedding], [paper_2_embedding])
Pair | Similarity Score | Result |
---|---|---|
Resume & Paper 1 | 0.92 | ✅ High Similarity |
Resume & Paper 2 | 0.34 | ❌ Low Similarity |
The paper with the highest similarity score is selected as the most relevant match.
✅ Most Relevant Paper Found!
Title: "A Deep Learning Approach to Fake News Detection"
Abstract: "We propose a model based on BERT for detecting fake news articles. Our approach achieves state-of-the-art performance in text classification tasks."
Similarity Score: 0.92
Once a matching paper is found, the system generates an internship request email using the Gemini API.
✅ Formal & Professional
✅ Technical & Research-Oriented
✅ Enthusiastic & Passionate
We would like to extend our heartfelt gratitude to everyone who contributed to this project. Your hard work and dedication made this possible!
Srujan Rana 🏆 Project Lead, Backend Developer |
Rudra Prasad Jena 💻 Frontend Developer & 🌐 API Integration |
Abhishek Kumar 💻 Frontend Developer |
🌟 Want to contribute?
We welcome contributions from the community! If you'd like to improve the project or report issues, feel free to fork the repo and submit a pull request.