This project is designed to classify resumes into different job categories based on their content. Users can upload resumes in PDF, DOCX, or TXT format, and the application will predict the category using a pre-trained machine learning model.
Resume Screening/
├── .ipynb_checkpoints/ # Jupyter Notebook checkpoints
├── AiModel.ipynb # Jupyter Notebook for model training
├── Data/ # Directory containing datasets
│ ├── Resume.csv # Main dataset of resumes
│ ├── UpdatedResumeDataSet.csv # Updated dataset
│ └── gpt_dataset.csv # Dataset for GPT models
├── img/ # Directory for images (currently empty)
├── NetworkSecurityEng_Resume.pdf # Example resume in PDF format
├── app.py # Main application file
├── clf.pkl # Trained classifier model
├── encoder.pkl # Label encoder for categories
├── health_fitness_resume.pdf # Example resume in PDF format
└── requirement.txt # Required packages for the project
- Clone the repository:
git clone https://github.com/adarshpheonix2810/Resume-Screening.git cd Resume Screening
- Install the required packages:
pip install -r requirement.txt
- Run the application:
streamlit run app.py
- Open your web browser and navigate to
http://localhost:8501
. - Upload a resume in PDF, DOCX, or TXT format.
- The application will display the predicted job category for the uploaded resume.
cleanResume(txt)
: Cleans the input resume text by removing unwanted characters and formatting.extract_text_from_pdf(file)
: Extracts text from a PDF file.extract_text_from_docx(file)
: Extracts text from a DOCX file.extract_text_from_txt(file)
: Extracts text from a TXT file with encoding handling.handle_file_upload(uploaded_file)
: Handles the file upload and text extraction based on file type.pred(input_resume)
: Predicts the category of the resume based on the cleaned text.
This project aims to provide an automated solution for classifying resumes into appropriate job categories, enhancing the job application process for candidates and recruiters alike.
- Python
- Streamlit
- Scikit-learn
- Pandas
- NumPy
- PyPDF2
- Python-docx
To train the model:
- Prepare your dataset in the required format (CSV).
- Use the Jupyter Notebook
AiModel.ipynb
to train the model. Ensure that the dataset is loaded correctly and the model parameters are set as needed. - Save the trained model using
pickle
to generateclf.pkl
,tfidf.pkl
, andencoder.pkl
files.
To test the application:
- Ensure that the application is running by executing
streamlit run app.py
. - Upload various resumes in different formats and verify the predicted categories.
- Check for any errors in the console and ensure the application handles them gracefully.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your changes and commit them with clear messages.
- Push your branch and create a pull request for review.
This project provides a simple yet effective way to classify resumes, helping job seekers understand the appropriate job categories for their applications.