A project that extracts and summarizes PDF resumes using both code-based logic (regex, dictionaries) and LLM refinement (Groq LLM). It then compares the finalized resume data against a user-provided job description using lemma-based and semantic matching approaches, yielding detailed matching scores.
Project Deployment Link: Resume Parser App
Check out the demo video here: Demo Video
-
PDF Resume Upload:
Upload any PDF resume file. -
Code-Based Parsing:
- Regex for phone and email.
- List-based skill detection from known keywords.
-
LLM Refinement:
- A Groq LLM verifies partial parsed data, removing incorrect items and producing a final summary in subpoints (Education, Experience, Skills, etc.).
- Fallback text appears if the LLM or API key is unavailable.
-
Job Description Matching:
- Lemma-Based (Jaccard / lexical overlap).
- Semantic (using sentence-transformers to measure embedding similarity).
- Combined final score.
-
Detailed Outputs:
- Raw vs. Cleaned Resume Text
- Matched & Unmatched Tokens / Sentences
- Final Summaries / Bullet Points
-
User Interaction:
- Paste a job description.
- Upload a PDF resume.
-
Partial Parsing:
- Regex extracts phone/email.
- Keyword detection finds known skills (e.g., "Python", "Java").
-
LLM Finalization:
- The partial parse plus the raw text is fed into Groq LLM.
- The LLM verifies or removes incorrect fields, then produces subheading-based summaries.
-
Matching with JD:
- Lemma-Based: Jaccard overlap between lemmatized tokens from the JD and resume.
- Semantic: Overall text similarity plus line-by-line JD comparisons.
- Combined: Weighted average, default 50/50.
- Clone or Download the repository.
- Install the dependencies (pinned in
requirements.txt
):pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
- Open the URL provided by Streamlit.
- Interact with the UI to upload a PDF resume and paste a job description.
-
app.py:
- Presents the Streamlit UI.
- Handles user input (resume + JD).
- Shows partial parse, LLM finalization, and matching outputs.
-
requirements.txt:
- Lists pinned versions for stable deployment.
-
Code & LLM synergy:
- The code-based approach ensures partial data extraction without relying solely on the LLM.
- The LLM refines that data, producing a final bullet-point summary.
- Live App: Resume Parser App
- Demo Video: Watch on Google Drive
- If the Groq LLM is unavailable or the key is invalid, you'll see a fallback message.
- The code-based parse is minimal (phone/email regex, skill dictionary). Extend these methods for deeper extraction.
- Torch-based dependencies can occasionally cause environment conflicts. See
requirements.txt
for pinned versions.
- Fork this repository.
- Create a new branch for your features/fixes.
- Open a Pull Request with a clear explanation.
This project is open-source under the MIT License. Feel free to use and adapt it.
- Streamlit for interactive UI.
- PyPDF2 for PDF text extraction.
- Groq for LLM inference.
- NLTK & sentence-transformers for textual processing & semantic matching.