📄 ResearchReach: A Cold-Email Helper

This project is a web-based tool that matches research papers to a candidate’s resume by utilizing Sentence-BERT (SBERT) and cosine similarity. The tool analyzes the candidate's skills, experience, and projects, then identifies the most relevant research papers based on the alignment between the resume and research paper content. By leveraging advanced natural language processing techniques, it enhances the process of finding research papers that best match a candidate’s qualifications, making the search more efficient and accurate.

🚀 Introduction

Finding relevant research papers based on a resume is a challenging task. This tool automates the process by:
✅ Extracting skills and projects from the resume
✅ Converting both the resume and research papers into vector embeddings
✅ Computing similarity scores using cosine similarity
✅ Returning the most relevant paper based on the highest similarity score

🔍 How It Works

The matching process follows a 4-step pipeline:

1️⃣ Resume Data Extraction

The system extracts key details from the candidate's resume, including:
- ✅ Skills (e.g., Machine Learning, NLP)
- ✅ Projects (e.g., Fake News Detection using BERT)

✅ Example:

Skills = ["Machine Learning", "Natural Language Processing", "Deep Learning", "Python"]  
Projects = ["Fake News Detection using BERT", "Text Summarization with LSTM"]

This information is concatenated into a single text input:

"Machine Learning Natural Language Processing Deep Learning Python Fake News Detection using BERT Text Summarization with LSTM"

🔍 Research Paper Matching System

🛠️ Tech Stack

Component	Tool
Frontend	React.js
Backend	Flask
Embedding Model	Sentence-BERT (all-MiniLM-L6-v2)
Paper Retrieval	Semantic Scholar API
Similarity Calculation	Cosine Similarity (Scikit-learn)
Email Generation	Gemini API
Paper Download	Unpaywall API

🏆 Features

✅ Fast and Efficient: Handles large datasets quickly using SBERT.
✅ Accurate Matching: High similarity scoring using cosine similarity.
✅ Automated Paper Retrieval: Uses Semantic Scholar to find relevant papers.
✅ Secure Data Handling: Ensures data privacy and integrity.
✅ Email Automation: Automatically generates internship request emails based on the matching paper.

🚀 Process Overview

Resume Parsing and Skill Extraction
Research Paper Retrieval
Convert to Sentence Embeddings
Compute Cosine Similarity
Generate and Send Email

📝 1️⃣ Resume Parsing and Skill Extraction

The system extracts skills and projects from the resume using pdfplumber, spaCy, and KeyBERT.

Example Skills:
Machine Learning, Natural Language Processing, Deep Learning, Python, Fake News Detection using BERT, Text Summarization with LSTM

📜 2️⃣ Research Paper Retrieval

The system retrieves research papers using Web Scraping with the help of beautifulsoup4 & Spacy

Example papers:

📜 Paper 1:
Title: "A Deep Learning Approach to Fake News Detection"
Abstract: "We propose a model based on BERT for detecting fake news articles. Our approach achieves state-of-the-art performance in text classification tasks."

📜 Paper 2:
Title: "Efficient Image Classification with CNNs"
Abstract: "We present an optimized CNN model for image classification. The model reduces computational cost while maintaining accuracy."

🔎 3️⃣ Convert to Sentence Embeddings

The system converts text into high-dimensional vector embeddings using Sentence-BERT (all-MiniLM-L6-v2):

from sentence_transformers import SentenceTransformer  

embed_model = SentenceTransformer('all-MiniLM-L6-v2')  
resume_embedding = embed_model.encode(resume_text)  
paper_1_embedding = embed_model.encode(paper_1_text)  
paper_2_embedding = embed_model.encode(paper_2_text)

✅ Example vector embeddings:

Resume Embedding → [0.12, -0.08, ..., 0.32]  
Paper 1 Embedding → [0.11, -0.07, ..., 0.30]  
Paper 2 Embedding → [0.02, 0.45, ..., -0.12]

📈 4️⃣ Compute Cosine Similarity

Cosine similarity measures how similar two vectors are:

[ \text{Cosine Similarity} = \frac{A \cdot B}{||A|| \cdot ||B||} ]

✅ Example calculation:

from sklearn.metrics.pairwise import cosine_similarity  

similarity_1 = cosine_similarity([resume_embedding], [paper_1_embedding])  
similarity_2 = cosine_similarity([resume_embedding], [paper_2_embedding])

Pair	Similarity Score	Result
Resume & Paper 1	0.92	✅ High Similarity
Resume & Paper 2	0.34	❌ Low Similarity

💡 5️⃣ Final Output

The paper with the highest similarity score is selected as the most relevant match.

✅ Most Relevant Paper Found!
Title: "A Deep Learning Approach to Fake News Detection"
Abstract: "We propose a model based on BERT for detecting fake news articles. Our approach achieves state-of-the-art performance in text classification tasks."
Similarity Score: 0.92

✉️ 6️⃣ Email Generation

Once a matching paper is found, the system generates an internship request email using the Gemini API.

Template Options:

✅ Formal & Professional
✅ Technical & Research-Oriented
✅ Enthusiastic & Passionate

🤝 Contributors

We would like to extend our heartfelt gratitude to everyone who contributed to this project. Your hard work and dedication made this possible!

Srujan Rana
🏆 Project Lead, Backend Developer

Rudra Prasad Jena
💻 Frontend Developer & 🌐 API Integration

Abhishek Kumar
💻 Frontend Developer

🌟 Want to contribute?
We welcome contributions from the community! If you'd like to improve the project or report issues, feel free to fork the repo and submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app		app
front		front
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 ResearchReach: A Cold-Email Helper

🚀 Introduction

🔍 How It Works

1️⃣ Resume Data Extraction

🔍 Research Paper Matching System

🛠️ Tech Stack

🏆 Features

🚀 Process Overview

📝 1️⃣ Resume Parsing and Skill Extraction

📜 2️⃣ Research Paper Retrieval

🔎 3️⃣ Convert to Sentence Embeddings

✅ Example vector embeddings:

📈 4️⃣ Compute Cosine Similarity

💡 5️⃣ Final Output

✉️ 6️⃣ Email Generation

Template Options:

🤝 Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Hackverse-2025/Thunderbolt

Folders and files

Latest commit

History

Repository files navigation

📄 ResearchReach: A Cold-Email Helper

🚀 Introduction

🔍 How It Works

1️⃣ Resume Data Extraction

🔍 Research Paper Matching System

🛠️ Tech Stack

🏆 Features

🚀 Process Overview

📝 1️⃣ Resume Parsing and Skill Extraction

📜 2️⃣ Research Paper Retrieval

🔎 3️⃣ Convert to Sentence Embeddings

✅ Example vector embeddings:

📈 4️⃣ Compute Cosine Similarity

💡 5️⃣ Final Output

✉️ 6️⃣ Email Generation

Template Options:

🤝 Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages