Made with ❤️ by Ayraf
🔍 This project provides an end-to-end pipeline for evaluating handwritten answer sheets against answer keys using OCR, NLP, and similarity scoring. It leverages IBM WatsonX, Granite 3.3 Vision models, and modern data processing tools for scalable, intelligent assessment.
Untitled.1.mov
- ✅ Upload handwritten answer sheets and answer keys (PDF)
- 🧠 Extract text using advanced OCR: WatsonX, Tesseract, or Gemini
- 📄 Convert PDFs to Parquet and JSON for structured processing
- 📊 Compute similarity scores between student and teacher answers
- 🌐 Interactive Streamlit web interface for processing & results
- 🔌 Modular codebase — easy to plug in custom models or tools
- ✍️ Handwriting Styles – Cursive, block letters, etc., cause OCR inconsistencies
- 🗂 Different Layouts – Adapting to varied answer sheet formats
- ❌ Scribbles & Missing Answers – Common in real-world sheets, affecting accuracy
- 📉 Diagram Recognition – Only simple diagrams like flowcharts or trees are supported
- 🖼️ Graphics-Heavy Content – Complex images reduce OCR and NLP accuracy
- 📈 Enhanced Diagram Support – Improve complex diagram understanding
- ☁️ Mass Data Storage Integration – Add KFP, S3, etc., for scalable storage and processing
finalflow/flow/
— Main pipeline, Streamlit app, and processing scriptsDataprep/
— Data preparation, embedding, and visualization utilitiespdf2parquet/
— PDF to Parquet conversion toolssrc/
— Additional scripts and utilities
-
Clone the repository:
git clone <repo-url> cd pythonProject1
-
Install dependencies:
pip install -r requirements.txt
For pdf2parquet submodule:
pip install -r pdf2parquet/
requirements.txt
- Set up environment variables:
- Create a.env file in the project root.
- Add your API keys and other configuration variables.
- Example.env file is provided for reference.
- Build the Docker image:
docker build -t
my-streamlit-app . 2. Run the Docker container:
docker run -p 8501:8501
my-streamlit-app
-
Launch the Streamlit app:
streamlit run finalflow/flow/streamlit_app.py
-
Follow the web UI to upload answer sheets and answer keys, process files, and view results.
-
Environment variables (API keys, etc.) can be set in a .env file.
-
The sample .env file is provided for reference:
IBM_API_KEY="" IBM_SERVICE_URL="" IBM_PROJECT_ID=""
-
Get these keys from watsonx.ai , which you will get when you create a API key for the foundational models- these can be changed with respect to the model card in the code.
-
Paths for input/output folders are configurable in the code.
- Python 3.8+
- IBM WatsonX API credentials (for OCR)
- Optional: Google Gemini API key (for Gemini OCR)
This project is licensed under the Apache License 2.0.
For more details, see the code and comments in each module.