Getting Started

The APIs and Controller -> Backend Server (Flask) section is finalized, should be.

The backend part swimlane diagram in the Model and Engin -> Data and Control Flow Diagram is also finalized.

Getting Started

This section outlines how to build and run the project, along with the direct third-party tools, libraries, SDKs, and APIs used.

https://github.com/JI-DeepSleep/DocuSnap-Frontend

For backend, checkout the following repo:

https://github.com/JI-DeepSleep/DocuSnap-Backend

Front-End (Android)

Built with Android Studio targeting Android 13 (API Level 33).

Dependencies

Encryption:
- Bouncy Castle – Cryptographic algorithms.
- Android Keystore System – Secure key storage.
Networking:
- OkHttp – Underlying HTTP/2 support.

Back-End (Flask)

Note: For a more detailed "Get Started", checkout the backend repo.

Built with Python, using the following core dependencies:

Dependencies

Web Framework:
- Flask – Lightweight WSGI server.
- Gunicorn – Production-grade WSGI HTTP server.
Database:
- SQLite – Embedded relational database.
OCR & AI Tools:
- CnOcr – Chinese and English OCR library.
- Zhipu AI API – Integration for generative AI tasks.

Notes

Android Setup: Ensure Android SDK 33 is configured in Android Studio.
Back-End Setup: Use pip install -r requirements.txt
The frontend stack has not been finalized. The backend stack won't be far from this version, but we're considering adding support for edge processing (move everything except LLM to the phone) for better security and privacy.

Model and Engine

Engine Components

User Frontend Handles UI interactions on the user's device.
Camera/Gallery Accesses device camera and photo storage.
Geo/Color Processor Performs image correction and enhancement.
Document/Form Handler Manages processing workflows and local data.
Frontend DB Stores processed documents/forms on device.
Backend Server Routes requests and manages tasks.
Backend Worker Executes asynchronous jobs.
Cache Server Temporary storage for processing results.
OCR Server Handles text extraction from images.
Zhipu LLM (External Service) Provides AI enrichment via API.

Component Integration

Device components (1-5) use Android OS capabilities
Backend services (6-9) run on our infrastructure
Zhipu LLM (10) is an external dependency

Data and Control Flow Diagram

We present the entity relationship in our app mainly through a swimlane diagram because we find it to be the most informative. Two block diagrams that best fit the assignment requirement but are less informative are also shown below.

Swimlane Diagram

Below is the example flow of data and control if we want to parse a document A and a form B, and use the current document database to fill form B (fill task C).

%%{init: {'theme': 'default', 'themeVariables': { 'primaryColor': '#f0f0f0'}}}%%
sequenceDiagram
    participant User as User Frontend<br>(User's Phone)
    participant CameraGallery as Camera/Gallery<br>(User's Phone)
    participant GeoColor as Geo/Color<br>(User's Phone)
    participant Handler as Document/Form Handler<br>(User's Phone)
    participant FEDB as Frontend DB<br>(User's Phone)
    participant Backend as Backend Server
    participant Worker as Backend Worker
    participant Cache as Cache Server
    participant OCR as OCR Server
    participant LLM as Zhipu LLM

    %% Document A Processing
    rect rgba(200,230,255,0.5)
        note over User: Document A (Camera)
        User->>CameraGallery: captureImage("camera")
        CameraGallery->>User: rawImage
        User->>GeoColor: correctGeometry(rawImage)
        GeoColor->>User: correctedImage
        User->>GeoColor: enhanceColors(correctedImage)
        GeoColor->>User: enhancedImage
        User->>Handler: processDocument(enhancedImage)
        Handler->>Backend: /api/process<br>(type=doc, SHA256_A, content=encrypted_payload)
        
        Backend->>Handler: 202 Accepted (processing)
        Backend->>Worker: Start processing thread
        
        par Polling and Processing
            loop Polling
                Handler->>Backend: /api/process<br>(type=doc, SHA256_A, has_content=false)
                Backend->>Cache: /api/cache/query<br>(client_id, SHA256_A, "doc")
                Cache->>Backend: 404 Not Found
                Backend->>Handler: 202 Accepted (processing)
            end
            
            Worker->>OCR: /api/ocr/extract
            OCR->>Worker: text
            Worker->>LLM: /api/llm/enrich
            LLM->>Worker: formatted_json
            Worker->>Cache: /api/cache/store<br>(client_id, SHA256_A, "doc", data)
            Cache->>Worker: 201 Created
        end
        
        Handler->>Backend: /api/process<br>(type=doc, SHA256_A, has_content=false)
        Backend->>Cache: /api/cache/query<br>(client_id, SHA256_A, "doc")
        Cache->>Backend: 200 OK (data)
        Backend->>Handler: 200 OK (result)
        Handler->>FEDB: saveDocument(sha256_A, metadata)
        Handler->>Backend: /api/clear<br>(client_id, SHA256_A)
        Backend->>Cache: /api/cache/clear<br>(client_id, SHA256_A, "doc")
        Cache->>Backend: 200 OK (cleared:1)
        Backend->>Handler: 200 OK (cleared:1)
        Handler->>User: processComplete
    end


    %% Form B Processing
    rect rgba(230,255,230,0.5)
        note over User: Form B (Gallery)
        User->>CameraGallery: captureImage("gallery")
        CameraGallery->>User: rawImage
        User->>GeoColor: correctGeometry(rawImage)
        GeoColor->>User: correctedImage
        User->>GeoColor: enhanceColors(correctedImage)
        GeoColor->>User: enhancedImage
        User->>Handler: processForm(enhancedImage, "formB")
        Handler->>Backend: /api/process<br>(type=form, SHA256_B, content=encrypted_payload)
        
        Backend->>Handler: 202 Accepted (processing)
        Backend->>Worker: Start processing thread
        
        par Polling and Processing
            loop Polling
                Handler->>Backend: /api/process<br>(type=form, SHA256_B, has_content=false)
                Backend->>Cache: /api/cache/query<br>(client_id, SHA256_B, "form")
                Cache->>Backend: 404 Not Found
                Backend->>Handler: 202 Accepted (processing)
            end
            
            Worker->>OCR: /api/ocr/extract
            OCR->>Worker: text
            Worker->>LLM: /api/llm/enrich
            LLM->>Worker: formatted_json
            Worker->>Cache: /api/cache/store<br>(client_id, SHA256_B, "form", data)
            Cache->>Worker: 201 Created
        end
        
        Handler->>Backend: /api/process<br>(type=form, SHA256_B, has_content=false)
        Backend->>Cache: /api/cache/query<br>(client_id, SHA256_B, "form")
        Cache->>Backend: 200 OK (data)
        Backend->>Handler: 200 OK (result)
        Handler->>FEDB: saveFormData("formB", data)
        Handler->>Backend: /api/clear<br>(client_id, SHA256_B)
        Backend->>Cache: /api/cache/clear<br>(client_id, SHA256_B, "form")
        Cache->>Backend: 200 OK (cleared:1)
        Backend->>Handler: 200 OK (cleared:1)
        Handler->>User: processComplete
    end

    %% Fill Task C
    rect rgba(255,230,200,0.5)
        note over User: Fill Task C
        User->>Handler:fillForm("formB")
        Handler->>Backend: /api/process<br>(type=fill, content=encrypted_payload)
        
        Backend->>Handler: 202 Accepted (processing)
        Backend->>Worker: Start processing thread
        
        par Polling and Processing
            loop Polling
                Handler->>Backend: /api/process<br>(type=fill, has_content=false)
                Backend->>Cache: /api/cache/query<br>(client_id, "composite_sha", "fill")
                Cache->>Backend: 404 Not Found
                Backend->>Handler: 202 Accepted (processing)
            end
            
            Worker->>LLM: /api/llm/enrich
            LLM->>Worker: filled_form
            Worker->>Cache: /api/cache/store<br>(client_id, "composite_sha", "fill", data)
            Cache->>Worker: 201 Created
        end
        
        Handler->>Backend: /api/process<br>(type=fill, has_content=false)
        Backend->>Cache: /api/cache/query<br>(client_id, "composite_sha", "fill")
        Cache->>Backend: 200 OK (data)
        Backend->>Handler: 200 OK (result)
        Handler->>FEDB: updateDocumentData(sha256_A, updates)
        Handler->>Backend: /api/clear<br>(client_id, "composite_sha")
        Backend->>Cache: /api/cache/clear<br>(client_id, "composite_sha", "fill")
        Cache->>Backend: 200 OK (cleared:1)
        Backend->>Handler: 200 OK (cleared:1)
        Handler->>User: processComplete
    end

Block Diagrams

Component Implementation

User Frontend
- Functionality: UI rendering and interaction
- Implementation: Android Studio (API 33); Build from scratch
Camera/Gallery
- Functionality: Image capture/selection
- Implementation: Android Studio (API 33) and Gallery APIs
Geo/Color Processor
- Functionality: Image correction/enhancement
- Implementation: Android Studio (API 33); Build from scratch
Document/Form Handler
- Functionality: Workflow coordination
- Implementation: Android Studio (API 33); Build from scratch
Frontend DB
- Functionality: Local data persistence
- Implementation: SQLite via Android Room
Backend Server
- Functionality: API routing
- Implementation: Flask + Gunicorn
Backend Worker
- Functionality: Async processing
- Implementation: Python threading
Cache Server
- Functionality: Temporary data storage
- Implementation: Flask + Gunicorn + SQLite
OCR Server
- Functionality: Text extraction
- Implementation: Flask + Gunicorn + CnOcr library
Zhipu LLM (External Service)
- Functionality: Data enrichment
- Implementation: External API integration

APIs and Controller

Frontend Modules (Function Calls)

Internal frontend APIs via function calls.

Camera/Gallery Module

function captureImage(source: "camera" | "gallery"): Image

Captures/selects image from device camera or gallery
Returns raw image object

Geometric Correction

function correctGeometry(image: Image): Image

Applies perspective correction and deskewing
Returns geometrically corrected image

Color Enhancement

function enhanceColors(image: Image): Image

Optimizes contrast, brightness and color balance
Returns color-enhanced image

Document Handler

function processDocument(enhancedImage: Image): { encryptedDoc: string, sha256: string }

Processes generic documents
Returns RSA-encrypted document and SHA256 hash

Form Handler

function processForm(enhancedImage: Image, formType: string): { encryptedDoc: string, sha256: string }
function fillForm(formId: string): JSON

processForm:

Processes structured forms using DB templates
Returns encrypted document and SHA256 hash

fillForm:

Fill the given form

Frontend Database

// Document storage
function saveDocument(sha256: string, metadata: JSON): boolean
function getDocument(sha256: string): Document
function updateDocumentData(sha256: string, updates: JSON): boolean

// Form data storage
function saveFormData(formId: string, data: JSON): boolean
function getFormData(formId: string): JSON

Backend Server (Flask)

Main entry point for processing requests and status checks.

Unified Processing Endpoint: `<backend server URL prefix>/process`

Handles all document processing types (doc/form/fill) through a single interface.
Request Body (JSON):

Key	Type	Required	Description
`client_id`	String (UUID)	Yes	Client identifier
`type`	String	Yes	Processing type: `"doc"`, `"form"`, or `"fill"`
`SHA256`	String	Yes	SHA256 hash computed as per rules below
`has_content`	Boolean	Yes	Indicates whether content payload is included
`content`	String(base64(AES(actual_json_string)))	No	Required when `has_content=true` - base64(AES(actual_json_string))
`aes_key`	String(RSA(real_aes_key))	No	Required when `has_content=true` - RSA(real_aes_key)

SHA256 Computation:

SHA256( content_string )

Content Payload Structure (After 1. base64 decoding and then 2. AES decryption):

{
  "to_process": ["base64_img1", "base64_img2"],  // For doc/form
  "to_process": form_obj,              // For fill
  "file_lib": {
    "docs": [doc_obj_1, doc_obj_2, ...],
    "forms": [form_obj_1, form_obj_2, ...]
  }
}

Validation:

has_content=true requires content field (else 400)
Computed SHA256 must match provided SHA256 (else 400)
Backend decrypts aes_key using private RSA key to get the real aes key.
Backend decrypts content using base64 decoding and then real aes key decryption.

Response:

{
  "status": "processing|completed|error",
  "error_detail": "Description",  // Only for error status
  "result": "base64(AES(actual json string))"    // Only for completed status, the content is 
}

Result Structures (after decryption):

// Doc type
{
  "title": "a few words",
  "tags": ["array", "of", "words"],
  "description": "a few sentences",
  "kv": {
    "key1": "value1",
    "key2": "value2"  // Extracted key-value pairs
  },
  "related": [    // array of related docs
    {"type": "xxx", "resource_id": "xxx"}
  ]
}

// Form type
{
  "title": "a few words",
  "tags": ["array", "of", "words"],
  "description": "a few sentences",
  "kv": {
    "key1": "value1",
    "key2": "value2"  // Extracted key-value pairs
  },
  "fields": ["field1", "field2"],
  "related": [		// array of related docs
    {"type": "xxx", "resource_id": "xxx"}  
  ]
}

// Fill type
{ // only include fields that has a match with file_lib it is okay that not all fields appear here
  "field1": {
    "value": "value1",
    "source": {"type": "xxx", "resource_id": "xxx"}  // type is either doc or form, and resource_id is uuid
  },
  "field2": {
    "value": "value2",
    "source": {"type": "xxx", "resource_id": "xxx"}
  }
}

Status Codes:

Code	Description
200	Result available (`status=completed`)
202	Processing in progress (`status=processing`)
400	Invalid input/SHA256 mismatch/SHA256 not recognized
500	Internal server error

Example Request:

{
  "client_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "doc",
  "SHA256": "9f86d081...b4b9a5",
  "has_content": true,
  "aes_key": "rsa encrypted"
  "content": "base64(AES(actual json string))"
}

Example Response:

{
  "status": "completed",
  "result": {    //  Decrypted
    "title": "Lease Agreement",
    "tags": ["legal", "contract"],
    "description": "Standard residential lease agreement for 12 months",
    "kv": {
      "landlord": "Jane Smith",
      "tenant": "John Doe",
      "term": "12 months"
    },
    "related": [
      {"type": "form", "resource_id": "that form's uuid"}
    ]
  }
}

Endpoint: `<backend server URL prefix>/clear`

Clears processing results from the system.
Request Body (JSON):

Key	Type	Required	Description
`client_id`	String (UUID)	Yes	Client identifier
`type`	String	No	Processing type: `"doc"`, `"form"`, or `"fill"`
`SHA256`	String	No	Specific document hash to clear

Response:

{
  "status": "ok"
}

Status Codes:

Code	Description
200	Clearance successful
400	Missing client_id
500	Internal clearance error

Cache Server (Flask+SQLite)

Stores and retrieves encrypted processing results using composite keys (client_id, SHA256, type).

The client (app) should not directly call this. This should be called by the backend server.

Endpoint: `<backend server URL prefix>/cache/query`

Retrieves cached processing results.
Query Parameters:

Key	Type	Required	Description
`client_id`	String (UUID)	Yes	Client identifier
`SHA256`	String	Yes	Document hash
`type`	String	Yes	`doc`, `form`, or `fill`

Response:

// Success (200)
{"data": "ENCRYPTED_RESULT_STRING"}
// Not found (404)
{"error": "Cache entry missing"}

Endpoint: `<backend server URL prefix>/cache/store`

Stores processing results in cache.
Request Body (JSON):

Key	Type	Required	Description
`client_id`	String (UUID)	Yes	Client identifier
`type`	String	Yes	Processing type: `"doc"`, `"form"`, or `"fill"`
`SHA256`	String	Yes	Document hash
`data`	String	Yes	Encrypted result data

Response: 201 Created (Empty body)

Endpoint: `<backend server URL prefix>/cache/clear`

Clears cached entries.
Request Body (JSON):

Key	Type	Required	Description
`client_id`	String (UUID)	Yes	Client identifier
`type`	String	No	Processing type: `"doc"`, `"form"`, or `"fill"`
`SHA256`	String	No	Specific document hash to clear

Response:

{
  "status": "ok"
}

OCR Server (CnOCR)

Performs text extraction from images.

The client (app) should not directly call this. This should be called by the backend server.

Endpoint: `<backend server URL prefix>/ocr/extract`

Request Body (JSON):

Key	Type	Required	Description
`image_data`	String (Base64)	Yes	Decrypted image

Response:

{
  "text": "Extracted document text..."
}

Status Code: 200 OK

Third-Party SDKs

1. LLM API Provider (Zhipu)

Format OCR data using LLM.

API Documentation: GLM-4 GLM-Z1

2. CnOCR

Chinese/English OCR tool for text recognition.

API Documentation:

CnOCR

View UI/UX

https://github.com/JI-DeepSleep/DocuSnap-Frontend/blob/main/README.md

Team Roster and Challenges

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
DocuSnap-Backend @ d68d5f7		DocuSnap-Backend @ d68d5f7
DocuSnap-Frontend @ c37f66d		DocuSnap-Frontend @ c37f66d
README.assets		README.assets
docs		docs
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
roster_and_challenges.md		roster_and_challenges.md

JI-DeepSleep/DocuSnap

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Front-End (Android)

Dependencies

Back-End (Flask)

Dependencies

Notes

Model and Engine

Engine Components

Component Integration

Data and Control Flow Diagram

Swimlane Diagram

Block Diagrams

Component Implementation

APIs and Controller

Frontend Modules (Function Calls)

Camera/Gallery Module

Geometric Correction

Color Enhancement

Document Handler

Form Handler

Frontend Database

Backend Server (Flask)

Unified Processing Endpoint: <backend server URL prefix>/process

Endpoint: <backend server URL prefix>/clear

Cache Server (Flask+SQLite)

Endpoint: <backend server URL prefix>/cache/query

Endpoint: <backend server URL prefix>/cache/store

Endpoint: <backend server URL prefix>/cache/clear

OCR Server (CnOCR)

Endpoint: <backend server URL prefix>/ocr/extract

Third-Party SDKs

1. LLM API Provider (Zhipu)

2. CnOCR

View UI/UX

Team Roster and Challenges

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 5

Uh oh!

Unified Processing Endpoint: `<backend server URL prefix>/process`

Endpoint: `<backend server URL prefix>/clear`

Endpoint: `<backend server URL prefix>/cache/query`

Endpoint: `<backend server URL prefix>/cache/store`

Endpoint: `<backend server URL prefix>/cache/clear`

Endpoint: `<backend server URL prefix>/ocr/extract`

Packages