Skip to content

JI-DeepSleep/DocuSnap

Repository files navigation

The APIs and Controller -> Backend Server (Flask) section is finalized, should be.

The backend part swimlane diagram in the Model and Engin -> Data and Control Flow Diagram is also finalized.

Getting Started

This section outlines how to build and run the project, along with the direct third-party tools, libraries, SDKs, and APIs used.

https://github.com/JI-DeepSleep/DocuSnap-Frontend

For backend, checkout the following repo:

https://github.com/JI-DeepSleep/DocuSnap-Backend

Front-End (Android)

Built with Android Studio targeting Android 13 (API Level 33).

Dependencies

Back-End (Flask)

Note: For a more detailed "Get Started", checkout the backend repo.

Built with Python, using the following core dependencies:

Dependencies

  • Web Framework:
    • Flask – Lightweight WSGI server.
    • Gunicorn – Production-grade WSGI HTTP server.
  • Database:
    • SQLite – Embedded relational database.
  • OCR & AI Tools:
    • CnOcr – Chinese and English OCR library.
    • Zhipu AI API – Integration for generative AI tasks.

Notes

  • Android Setup: Ensure Android SDK 33 is configured in Android Studio.
  • Back-End Setup: Use pip install -r requirements.txt
  • The frontend stack has not been finalized. The backend stack won't be far from this version, but we're considering adding support for edge processing (move everything except LLM to the phone) for better security and privacy.

Model and Engine

Engine Components

  1. User Frontend Handles UI interactions on the user's device.
  2. Camera/Gallery Accesses device camera and photo storage.
  3. Geo/Color Processor Performs image correction and enhancement.
  4. Document/Form Handler Manages processing workflows and local data.
  5. Frontend DB Stores processed documents/forms on device.
  6. Backend Server Routes requests and manages tasks.
  7. Backend Worker Executes asynchronous jobs.
  8. Cache Server Temporary storage for processing results.
  9. OCR Server Handles text extraction from images.
  10. Zhipu LLM (External Service) Provides AI enrichment via API.

Component Integration

  • Device components (1-5) use Android OS capabilities
  • Backend services (6-9) run on our infrastructure
  • Zhipu LLM (10) is an external dependency

Data and Control Flow Diagram

We present the entity relationship in our app mainly through a swimlane diagram because we find it to be the most informative. Two block diagrams that best fit the assignment requirement but are less informative are also shown below.

Swimlane Diagram

Below is the example flow of data and control if we want to parse a document A and a form B, and use the current document database to fill form B (fill task C).

%%{init: {'theme': 'default', 'themeVariables': { 'primaryColor': '#f0f0f0'}}}%%
sequenceDiagram
    participant User as User Frontend<br>(User's Phone)
    participant CameraGallery as Camera/Gallery<br>(User's Phone)
    participant GeoColor as Geo/Color<br>(User's Phone)
    participant Handler as Document/Form Handler<br>(User's Phone)
    participant FEDB as Frontend DB<br>(User's Phone)
    participant Backend as Backend Server
    participant Worker as Backend Worker
    participant Cache as Cache Server
    participant OCR as OCR Server
    participant LLM as Zhipu LLM

    %% Document A Processing
    rect rgba(200,230,255,0.5)
        note over User: Document A (Camera)
        User->>CameraGallery: captureImage("camera")
        CameraGallery->>User: rawImage
        User->>GeoColor: correctGeometry(rawImage)
        GeoColor->>User: correctedImage
        User->>GeoColor: enhanceColors(correctedImage)
        GeoColor->>User: enhancedImage
        User->>Handler: processDocument(enhancedImage)
        Handler->>Backend: /api/process<br>(type=doc, SHA256_A, content=encrypted_payload)
        
        Backend->>Handler: 202 Accepted (processing)
        Backend->>Worker: Start processing thread
        
        par Polling and Processing
            loop Polling
                Handler->>Backend: /api/process<br>(type=doc, SHA256_A, has_content=false)
                Backend->>Cache: /api/cache/query<br>(client_id, SHA256_A, "doc")
                Cache->>Backend: 404 Not Found
                Backend->>Handler: 202 Accepted (processing)
            end
            
            Worker->>OCR: /api/ocr/extract
            OCR->>Worker: text
            Worker->>LLM: /api/llm/enrich
            LLM->>Worker: formatted_json
            Worker->>Cache: /api/cache/store<br>(client_id, SHA256_A, "doc", data)
            Cache->>Worker: 201 Created
        end
        
        Handler->>Backend: /api/process<br>(type=doc, SHA256_A, has_content=false)
        Backend->>Cache: /api/cache/query<br>(client_id, SHA256_A, "doc")
        Cache->>Backend: 200 OK (data)
        Backend->>Handler: 200 OK (result)
        Handler->>FEDB: saveDocument(sha256_A, metadata)
        Handler->>Backend: /api/clear<br>(client_id, SHA256_A)
        Backend->>Cache: /api/cache/clear<br>(client_id, SHA256_A, "doc")
        Cache->>Backend: 200 OK (cleared:1)
        Backend->>Handler: 200 OK (cleared:1)
        Handler->>User: processComplete
    end


    %% Form B Processing
    rect rgba(230,255,230,0.5)
        note over User: Form B (Gallery)
        User->>CameraGallery: captureImage("gallery")
        CameraGallery->>User: rawImage
        User->>GeoColor: correctGeometry(rawImage)
        GeoColor->>User: correctedImage
        User->>GeoColor: enhanceColors(correctedImage)
        GeoColor->>User: enhancedImage
        User->>Handler: processForm(enhancedImage, "formB")
        Handler->>Backend: /api/process<br>(type=form, SHA256_B, content=encrypted_payload)
        
        Backend->>Handler: 202 Accepted (processing)
        Backend->>Worker: Start processing thread
        
        par Polling and Processing
            loop Polling
                Handler->>Backend: /api/process<br>(type=form, SHA256_B, has_content=false)
                Backend->>Cache: /api/cache/query<br>(client_id, SHA256_B, "form")
                Cache->>Backend: 404 Not Found
                Backend->>Handler: 202 Accepted (processing)
            end
            
            Worker->>OCR: /api/ocr/extract
            OCR->>Worker: text
            Worker->>LLM: /api/llm/enrich
            LLM->>Worker: formatted_json
            Worker->>Cache: /api/cache/store<br>(client_id, SHA256_B, "form", data)
            Cache->>Worker: 201 Created
        end
        
        Handler->>Backend: /api/process<br>(type=form, SHA256_B, has_content=false)
        Backend->>Cache: /api/cache/query<br>(client_id, SHA256_B, "form")
        Cache->>Backend: 200 OK (data)
        Backend->>Handler: 200 OK (result)
        Handler->>FEDB: saveFormData("formB", data)
        Handler->>Backend: /api/clear<br>(client_id, SHA256_B)
        Backend->>Cache: /api/cache/clear<br>(client_id, SHA256_B, "form")
        Cache->>Backend: 200 OK (cleared:1)
        Backend->>Handler: 200 OK (cleared:1)
        Handler->>User: processComplete
    end

    %% Fill Task C
    rect rgba(255,230,200,0.5)
        note over User: Fill Task C
        User->>Handler:fillForm("formB")
        Handler->>Backend: /api/process<br>(type=fill, content=encrypted_payload)
        
        Backend->>Handler: 202 Accepted (processing)
        Backend->>Worker: Start processing thread
        
        par Polling and Processing
            loop Polling
                Handler->>Backend: /api/process<br>(type=fill, has_content=false)
                Backend->>Cache: /api/cache/query<br>(client_id, "composite_sha", "fill")
                Cache->>Backend: 404 Not Found
                Backend->>Handler: 202 Accepted (processing)
            end
            
            Worker->>LLM: /api/llm/enrich
            LLM->>Worker: filled_form
            Worker->>Cache: /api/cache/store<br>(client_id, "composite_sha", "fill", data)
            Cache->>Worker: 201 Created
        end
        
        Handler->>Backend: /api/process<br>(type=fill, has_content=false)
        Backend->>Cache: /api/cache/query<br>(client_id, "composite_sha", "fill")
        Cache->>Backend: 200 OK (data)
        Backend->>Handler: 200 OK (result)
        Handler->>FEDB: updateDocumentData(sha256_A, updates)
        Handler->>Backend: /api/clear<br>(client_id, "composite_sha")
        Backend->>Cache: /api/cache/clear<br>(client_id, "composite_sha", "fill")
        Cache->>Backend: 200 OK (cleared:1)
        Backend->>Handler: 200 OK (cleared:1)
        Handler->>User: processComplete
    end
Loading

Block Diagrams

image-20250628173951645

image-20250628173935873

Component Implementation

  1. User Frontend
    • Functionality: UI rendering and interaction
    • Implementation: Android Studio (API 33); Build from scratch
  2. Camera/Gallery
    • Functionality: Image capture/selection
    • Implementation: Android Studio (API 33) and Gallery APIs
  3. Geo/Color Processor
    • Functionality: Image correction/enhancement
    • Implementation: Android Studio (API 33); Build from scratch
  4. Document/Form Handler
    • Functionality: Workflow coordination
    • Implementation: Android Studio (API 33); Build from scratch
  5. Frontend DB
    • Functionality: Local data persistence
    • Implementation: SQLite via Android Room
  6. Backend Server
    • Functionality: API routing
    • Implementation: Flask + Gunicorn
  7. Backend Worker
    • Functionality: Async processing
    • Implementation: Python threading
  8. Cache Server
    • Functionality: Temporary data storage
    • Implementation: Flask + Gunicorn + SQLite
  9. OCR Server
    • Functionality: Text extraction
    • Implementation: Flask + Gunicorn + CnOcr library
  10. Zhipu LLM (External Service)
    • Functionality: Data enrichment
    • Implementation: External API integration

APIs and Controller

Frontend Modules (Function Calls)

Internal frontend APIs via function calls.

Camera/Gallery Module

function captureImage(source: "camera" | "gallery"): Image
  • Captures/selects image from device camera or gallery
  • Returns raw image object

Geometric Correction

function correctGeometry(image: Image): Image
  • Applies perspective correction and deskewing
  • Returns geometrically corrected image

Color Enhancement

function enhanceColors(image: Image): Image
  • Optimizes contrast, brightness and color balance
  • Returns color-enhanced image

Document Handler

function processDocument(enhancedImage: Image): { encryptedDoc: string, sha256: string }
  • Processes generic documents
  • Returns RSA-encrypted document and SHA256 hash

Form Handler

function processForm(enhancedImage: Image, formType: string): { encryptedDoc: string, sha256: string }
function fillForm(formId: string): JSON

processForm:

  • Processes structured forms using DB templates
  • Returns encrypted document and SHA256 hash

fillForm:

  • Fill the given form

Frontend Database

// Document storage
function saveDocument(sha256: string, metadata: JSON): boolean
function getDocument(sha256: string): Document
function updateDocumentData(sha256: string, updates: JSON): boolean

// Form data storage
function saveFormData(formId: string, data: JSON): boolean
function getFormData(formId: string): JSON

Backend Server (Flask)

Main entry point for processing requests and status checks.

Unified Processing Endpoint: <backend server URL prefix>/process

Handles all document processing types (doc/form/fill) through a single interface.
Request Body (JSON):

Key Type Required Description
client_id String (UUID) Yes Client identifier
type String Yes Processing type: "doc", "form", or "fill"
SHA256 String Yes SHA256 hash computed as per rules below
has_content Boolean Yes Indicates whether content payload is included
content String(base64(AES(actual_json_string))) No Required when has_content=true - base64(AES(actual_json_string))
aes_key String(RSA(real_aes_key)) No Required when has_content=true - RSA(real_aes_key)

SHA256 Computation:

SHA256( content_string )

Content Payload Structure (After 1. base64 decoding and then 2. AES decryption):

{
  "to_process": ["base64_img1", "base64_img2"],  // For doc/form
  "to_process": form_obj,              // For fill
  "file_lib": {
    "docs": [doc_obj_1, doc_obj_2, ...],
    "forms": [form_obj_1, form_obj_2, ...]
  }
}

Validation:

  1. has_content=true requires content field (else 400)
  2. Computed SHA256 must match provided SHA256 (else 400)
  3. Backend decrypts aes_key using private RSA key to get the real aes key.
  4. Backend decrypts content using base64 decoding and then real aes key decryption.

Response:

{
  "status": "processing|completed|error",
  "error_detail": "Description",  // Only for error status
  "result": "base64(AES(actual json string))"    // Only for completed status, the content is 
}

Result Structures (after decryption):

// Doc type
{
  "title": "a few words",
  "tags": ["array", "of", "words"],
  "description": "a few sentences",
  "kv": {
    "key1": "value1",
    "key2": "value2"  // Extracted key-value pairs
  },
  "related": [    // array of related docs
    {"type": "xxx", "resource_id": "xxx"}
  ]
}

// Form type
{
  "title": "a few words",
  "tags": ["array", "of", "words"],
  "description": "a few sentences",
  "kv": {
    "key1": "value1",
    "key2": "value2"  // Extracted key-value pairs
  },
  "fields": ["field1", "field2"],
  "related": [		// array of related docs
    {"type": "xxx", "resource_id": "xxx"}  
  ]
}

// Fill type
{ // only include fields that has a match with file_lib it is okay that not all fields appear here
  "field1": {
    "value": "value1",
    "source": {"type": "xxx", "resource_id": "xxx"}  // type is either doc or form, and resource_id is uuid
  },
  "field2": {
    "value": "value2",
    "source": {"type": "xxx", "resource_id": "xxx"}
  }
}

Status Codes:

Code Description
200 Result available (status=completed)
202 Processing in progress (status=processing)
400 Invalid input/SHA256 mismatch/SHA256 not recognized
500 Internal server error

Example Request:

{
  "client_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "doc",
  "SHA256": "9f86d081...b4b9a5",
  "has_content": true,
  "aes_key": "rsa encrypted"
  "content": "base64(AES(actual json string))"
}

Example Response:

{
  "status": "completed",
  "result": {    //  Decrypted
    "title": "Lease Agreement",
    "tags": ["legal", "contract"],
    "description": "Standard residential lease agreement for 12 months",
    "kv": {
      "landlord": "Jane Smith",
      "tenant": "John Doe",
      "term": "12 months"
    },
    "related": [
      {"type": "form", "resource_id": "that form's uuid"}
    ]
  }
}

Endpoint: <backend server URL prefix>/clear

Clears processing results from the system.
Request Body (JSON):

Key Type Required Description
client_id String (UUID) Yes Client identifier
type String No Processing type: "doc", "form", or "fill"
SHA256 String No Specific document hash to clear

Response:

{
  "status": "ok"
}

Status Codes:

Code Description
200 Clearance successful
400 Missing client_id
500 Internal clearance error

Cache Server (Flask+SQLite)

Stores and retrieves encrypted processing results using composite keys (client_id, SHA256, type).

The client (app) should not directly call this. This should be called by the backend server.

Endpoint: <backend server URL prefix>/cache/query

Retrieves cached processing results.
Query Parameters:

Key Type Required Description
client_id String (UUID) Yes Client identifier
SHA256 String Yes Document hash
type String Yes doc, form, or fill

Response:

// Success (200)
{"data": "ENCRYPTED_RESULT_STRING"}
// Not found (404)
{"error": "Cache entry missing"}

Endpoint: <backend server URL prefix>/cache/store

Stores processing results in cache.
Request Body (JSON):

Key Type Required Description
client_id String (UUID) Yes Client identifier
type String Yes Processing type: "doc", "form", or "fill"
SHA256 String Yes Document hash
data String Yes Encrypted result data

Response: 201 Created (Empty body)


Endpoint: <backend server URL prefix>/cache/clear

Clears cached entries.
Request Body (JSON):

Key Type Required Description
client_id String (UUID) Yes Client identifier
type String No Processing type: "doc", "form", or "fill"
SHA256 String No Specific document hash to clear

Response:

{
  "status": "ok"
}

OCR Server (CnOCR)

Performs text extraction from images.

The client (app) should not directly call this. This should be called by the backend server.

Endpoint: <backend server URL prefix>/ocr/extract

Request Body (JSON):

Key Type Required Description
image_data String (Base64) Yes Decrypted image

Response:

{
  "text": "Extracted document text..."
}

Status Code: 200 OK

Third-Party SDKs

1. LLM API Provider (Zhipu)

Format OCR data using LLM.

2. CnOCR

Chinese/English OCR tool for text recognition.

  • API Documentation:

    CnOCR

View UI/UX

https://github.com/JI-DeepSleep/DocuSnap-Frontend/blob/main/README.md

Team Roster and Challenges

Team Roster and Challenges

About

DocuSnap: Your AI-powered Personal Document Assistant.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 5